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ABSTRACT 

This thesis describes a model diagnostic problem and a computer 
program designed to deal with this problem. The model diagnostic 
problem is an abstract problem. A major contention of this thesis, 
however, is that this problem subsumes the principal features of a 
number of ostensibly different real diagnostic problems including 
certain problems of medical diagnosis and the diagnosis of machine 
failures. A second major contention of this thesis is that strate- 
gies for the solution of the model diagnostic problem can be formu- 
lated in terms sufficiently explicit to permit their incorporation 
in a computer program. 

The model diagnostic problem assumes that the system being di- 
agnosed (e.g. a person, or machine) is in one of a finite number of 
known states . Tests can be performed at some cost to discover attrib - 
utes of the system, for example signs or symptoms in medical diagno- 
sis. The current state of the system is to be deduced from the ob- 
served attributes and past experience with similar systems. In the 
model, this experience is represented principally in terms of proba- 
bilities (e.g. the conditional probability of a certain attribute 
given the system state). 

The statement of the model diagnostic problem requires that the 
diagnostician also account for the cost of various misdiagnoses. In 
particular for each pair of states i^ and Jj the cost of misdiagnosing 
state j as state i, 1^,, is given. Thus the diagnostician must bal- 
ance the cost of performing additional tests against the expected 
reduction in the cost of misdiagnosis. This requirement suggests the 
value of sequential diagnosis . 

ii 



A computer program was developed to solve the model diagnostic 
problem. It consists of 1) an inference function which is based on 
a Bayesian analysis of attributes and includes a flexible way of 
dealing with non-independent attributes, 2) a pattern - sorting function 
which allows the program to detect irrelevant attributes and patterns 
of attributes corresponding to two different system states, and 3) 
a test selection function which employs various heuristics to select 
good tests for the user of the program to perform on the system under 
consideration. The diagnostic program is specialized for a particular 
problem by providing it with the appropriate experience. The pro- 
gram is embedded in an environment (set of programs) which facili- 
tates the study of various diagnostic strategies. 

The diagnostic program was implemented on the time -sharing 
system at Project MAC. It was applied to two medical problems, the 
diagnosis of congenital heart disease, and the diagnosis of primary 
bone tumors. The results obtained here suggest 1) that a computer 
program can be of considerable value as a diagnostic tool, and 2) 
that it is quite advantageous for such a program to perform sequential 
diagnosis as it interacts with the user. 
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Chapter 1 
DIAGNOSTIC PROBLEMS AND PROCESSES 

There are many problem areas In which attention is focused 
on some system . In these areas, the principal problem is to ascer- 
tain the current state of the given system. In general terms such 
a problem is a diagnostic problem. The problem-solver or diagnosti- 
cian is equipped for his task with information distilled from past 
experience with such systems, and he attempts to couple this gen- 
eral knowledge with specific observations or tests of the given 
system In such a way that he can deduce the identity of the current 
state. The extent of the general knowledge, its organization, 
and the particular manner in which it is brought to bear on the di- 
a gnostic problem* the diagnostic process, may vary considerably 
among different problem areas, but the general nature of the prob- 
lem persists. 

Thus the medical diagnostician deals with the problem of dis- 
covering the "state" of the patient. Through training and experi- 
ence, the physician has learned the sign and symptom patterns asso- 
ciated with possible diseases from which the patient can suffer. One 
problem is the effective utilization of this experience which is 
framed in terms of the abstraction of the disease and the reality of 
the individual patient. An additional complication arises from the 



fact that different diseases may result in similar signs and symp- 
toms. The physician exploits his general knowledge or experience 
in the selection of a sequence of tests to apply to the patient. 
The results of these tests provide him with information from which 
he constructs a more complete picture of the health of the patient. 
These tests may include simple questions as in the history- taking 
or complicated medical procedures such as in an exploratory opera- 
tion. Since tests may exact a high cost (in terms of risk to the pa- 
tient, patient discomfort, the time of skilled persons , money, etc.), 
it is the additional task of the diagnostician to properly balance 
this cost against the potential usefulness of the test results. For 
these and other reasons, medical diagnosis is often a complex and 
difficult intellectual problem. 

A second example of a diagnostic problem is that of debugging 
computer programs. A program containing one or more errors can be 
thought of as a system for which it is desired to determine the state. 
The state in this case is characterized by the particular combination 
of errors. The programmer brings his past experience with a variety 
of programs to bear on this diagnostic problem. By controlling the 
inputs to the program, applying traces, or altering instruction se- 
quences, or employing a post mortem, he can perform a range of tests 
on the program. The results of these tests may suggest new tests as 
well as providing the programmer with new insight into the problem 
currently confronting him. Like medical diagnosis, program debugging 



is often a difficult task, requiring considerable judgment both in 
the selection of tests and the interpretation of results. 

The research reported here is concerned with a particular diag- 
nostic problem and a diagnostic process for solving that problem. It 
has several aims. The first is to formulate the model of the diagnos- 
tic problem in such a way that the definition subsumes the principal 
features of problems in a number of ostensibly different problem 
areas. For example, the definition might apply both to medical diag- 
nosis and to program debugging, although it might not be the particu- 
lar definition employed by diagnosticians in the respective areas. 
That such a model can be formulated is the major contention of this 
thesis. The second aim is to develop and investigate strategies for 
the solution of this model diagnostic problem. Because they are to 
be stated in terms of an abstract problem, such strategies will be 
independent of any real diagnostic problem. These diagnostic pro- 
cedures then are to be embodied in a computer program. This step 
serves two purposes. First, the program provides an explicit state- 
ment of the diagnostic strategies, and thus facilitates the testing 
of these strategies on particular problems. Second, if the strate- 
gies in the program prove effective in practical applications, the 
program could be of considerable value in computer-aided diagnosis. 
In the event that this approach were successful, the resulting pro- 
gram may be useful in a number of distinct diagnostic problems, since 
the methods it employed would be problem- independent. The second 



major contention of this thesis is that given a model for the diag- 
nostic problem, effective strategies for the solution of the problem 
can be formulated in terms appropriate for their implementation in a 
program. 

Such a program for diagnosis could be embedded in an environment 
(other programs) which would permit two different uses of the program. 
First, the program could be applied to actual diagnostic problems so 
that its effectiveness could be determined. Second, the environment 
could permit the study of a variety of artificial problems, each 
designed to test a particular aspect of program performance. The first 
type of application might be termed "open diagnosis"; and the second, 
"closed diagnosis." Closed diagnosis may facilitate the development 
of improved diagnostic strategies. 

In order for a diagnostic problem to exist, one must have at 
least some knowledge of the nature of the system being considered. 
Further the various states of the system must manifest themselves 
through certain observable attributes . It should also be possible 
to apply tests to the system at some cost to obtain more attributes. 
Finally, the general knowledge of the system must include some com- 
prehension of the relationships among signs, states, and tests. The 
prerequisites are satisfied by the two examples of diagnostic prob- 
lems presented above. In fact, in simplest terms, this is the basis 



The term attribute is used in this thesis to denote any observable 
manifestation of system state which is employed in the deductive phase 
of diagnosis. For example, it includes both signs and symptoms in 
medicine. 



for the diagnostic problem studied in this work. 

A Brief Outline of a Diagnostic Process 

The basic outline of a diagnostic process is as follows. Be- 
cause the observation of certain initial attributes suggest a diag- 
nostic problem in some system, the diagnostician wishes to ascertain 
the current state of the system. He selects a test (based on some 
criterion) and applies it to the system. The application of the test 
yields to update his current view of the problem. He then applies 
another test and obtains more attributes. This process continues un- 
til the diagnostician makes a decision about the current state. Now 
this is a most sketchy outline of the diagnostic process. There can 
be a great deal of sophisticated information processing during each 
iteration of the process. The point is that test selection and in - 
ference are the two principal features of diagnosis as performed in 
a number of distinct areas. The outline above seems equally appli- 
cable to medical diagnosis, qualitative chemical analysis, and the 
problem of diagnosing a malfunctioning automobile. At this level, then, 
the diagnostic processes in these and other areas exhibit considerable 
similarity. Inference and test selection appear to be the keys to 
diagnostic strategies of some generality. If it could be demon- 
strated that these features of the process necessarily differ funda- 
mentally from area to area_, than there would be little hope for the 
formulation of general diagnostic strategies. In fact, as will be 
shown in this work, there is reason to believe quite the contrary. 



It appears that, for a number of areas, problem- independent diagnostic 
strategies can be developed. Note that the strategies employed by 
experts in different fields may be quite dissimilar, there is no re- 
quirement that the strategies developed here resemble theirs. The 
criterion by which strategies will be judged is how effective they 
are in particular applications, not how closely they approximate those 
currently used by human experts. 

The diagnostic process then merits careful study for several 
reasons. First, as indicated above, variations of this problem arise 
in many different contexts and so the problem is of general interest. 
Second, the nature of the diagnostic problem is such that it often 
requires a great deal of intellectual effort to solve it, and any 
means of improving the problem- solving process will be of consider- 
able value. Finally, the general form of the problem suggests the 
value of a man-machine partnership in the problem-solving process. 
Before such a partnership can be established, however, the diagnostic 
process must be carefully explored in order to determine respective 
parts to be played by man and machine. 

Some Further Comments on the Difficulties of Diagnosis 

Diagnostic problems on the whole are difficult ones, particularly 
for non-experts. Moreover, a great many diagnostic problems consti- 
tute considerable challenges to the skill of even the most expert 
diagnostician. Several factors contribute to the complexity of the 
diagnostic problem. First, an expert diagnostician must be aware of 



a large number of relationships among system states and attributes. 
As evidence of this, consider the considerable training required to 
develop the skills of a medical diagnostician. Observation of many 
different attributes may be required to identify a particular state, 
and a given attribute may suggest many possible states. These facts 
coupled with the often large number of states and attributes require 
the diagnostician to master considerable amounts of information. 

Often the relationships mentioned above are known only in proba- 
bilistic terms. In such a case, the task of the diagnostician is 
complicated by the need for some form of probability analysis, a 
task which generally proves quite difficult for human beings. The 
accurate assessment of probabilities for a large number of possible 
states given observed attributes requires extensive training and ex- 
perience. 

Another factor complicating the task of the diagnostician is 
the difficulty of establishing and maintaining an appropriate struc- 
ture for all the information relevant to the diagnostic area. Much 
of the usefulness of that information in the diagnostic process ac- 
crues from its organization. A major portion of the expert's skill 
is derived from his ability to associate particular attributes or at- 
tribute patterns with possible system states and subsequent testing 
strategies. Again extensive experience and training are required to 
organize the relevant information into a useful associative structure. 
Unfortunately such a structure is not easily maintained. Associations 



which are seldom used may be effectively lost to the diagnostician. 
As a result, his field of competence tends to become narrow. This 
tendency is accelerated when the diagnostician must devote considerable 
effort to the mastering of a continual stream of newly-relevant in- 
formation. 

A computer program to provide general diagnostic assistance to 
Its user would help circumvent some of these difficulties. One of 
the significant advantages to be gained from the use of a computer 
is the sheer bulk of information which it can maintain. A diagnos- 
tic program would be able to deal with extremely large information 
structures. Since the program would be independent of the content 
of the information structure which it employed, that content could 
be continually updated without affecting the operation of the pro- 
gram (although better information should result in better program 
performance) . 

The amount of logical and probabilistic inference with which 
the program could cope would exceed that comprehensible to a human 
being. This capability would permit the more extensive exploration 
of possible testing strategies. Because the program could consider 
more possible diagnoses than a human being, it would provide a strong 
safeguard that a particular state is not overlooked in the diagnosis. 
Finally, a diagnostic program which was "table -driven" would be of 
all the more value because of its potential applicability to a 
variety of problems. 



Note that diagnostic strategies suited for a computer are not 
necessarily suited for a human diagnostician. While human diagnos- 
ticians possess many special skills and hence serve as good sources 
of information about diagnosis, the purpose of this research does 
not restrict the set of possible strategies to those employed by 
humans. The goal is to develop strategies which enable the pecu- 
liar capabilities of the computer to be exploited. Additional in- 
sight into the nature of the human diagnostic process and the dis- 
covery of ways to improve it would be a valuable, but derivative 
result of this research. 

A Preface to the Material Which Follows 

This thesis describes a computer program for diagnosis and 
presents the results of some experiments performed with this pro- 
gram. The design of the program was strongly influenced by the model 
diagnostic problem chosen for this research. Although later chapters 
contain detailed discussions of this problem, a brief summary of its 
principal characteristics is presented here to provide some perspec- 
tive on the problem. 

The statement of the diagnostic problem considered here assumes 
that the system is in one of a finite number of states. The object 
of the diagnosis is to identify the current state of the system. Ex- 
perience with similar systems is assumed to be available. This ex- 
perience is in the form of probabilities for the various states and 
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probabilities of attributes given state. Test costs are constant 
and known. Furthermore the application of a test does not change 
the state of the system. Tests are also assumed to be accurate. 
Finally., it is assumed that the decision loss for each possible mis- 
diagnosis is given in the same units as test costs. This work, 
then., is concerned with the development of strategies to solve 
diagnostic problems which can be stated in keeping with these as- 
sumptions. 

Chapter 2 examines some of the research reported in the lit- 
erature which has direct relevance to this work. 

Chapter 3 presents two views of a diagnostic problem. In the 
first view, diagnosis is considered as a problem in pattern recog- 
nition. The implications and limitations of this view are examined. 
Then the problem of diagnosis is formulated as a sequential decision 
problem. This formulation underscores the computational problems 
associated with the determination of optimal testing strategies. 
Finally^ a discussion of heuristic considerations in test selection 
is presented. 

A system for the study of computer-aided diagnosis is des- 
cribed in detail in Chapter 4. This system includes both a diag- 
nostic program and a variety of programs which provide an environ- 
ment within which different diagnostic strategies can be studied. 

The next three chapters are devoted to experiments performed 
with the diagnostic system. Chapter 5 discusses the use of the sys- 
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tern in the diagnosis of primary bone tumors; and Chapter 6., an ap- 
plication of the system to the diagnosis of congenital heart disease. 
A number of other experiments with the system are discussed in 
Chapter 7. Chapter 8 presents a discussion of the results of the 
research and delineates some areas for further investigation. 



Chapter 2 
LITERATURE SURVEY 

A. Diagnostic Programs 

In recent years, there has been an increasing amount of work 
done on various aspects of diagnosis. Some of this work has been 
aimed at the development of computer programs to perform particular 
diagnostic tasks. Other work has been more oriented toward the 
study of human diagnosticians and the strategies they employ. A 
brief survey of this work is presented in this chapter. Examples 
of computer programs for diagnosis are discussed. Of particular 
interest are the diagnostic strategies and models employed by such 
programs. Finally, some broad views of diagnosis and its attendant 
difficulties are considered. 

By far the greatest concentration of research in computer- 
aided diagnosis has been focused in the area of medical diagnosis. 
A number of programs have been written which are capable of perform- 
ing diagnosis in particular medical areas. These programs, as a 
rule employ a Bayesian analysis of attributes based on a disease- 
attribute probability matrix for the given set of diseases considered. 
That is the programs compute the probability of disease D given 
the set of attributes A as follows 



P(D/A) = P(D) P<A/D) 
nu/A; rP(D) P(A/D) 
D 
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where P(D) is the a priori probability of D. 
P(A/D) is the conditional probability of A given D. 
The use of a disease-attribute model and Bayesian inference was ad- 
vocated by a number of researchers as early as 1959 (Rl, R2, R3, R4,). 
While other means of inferring diseases from their attributes were 
suggested at this time (R5, R6), the Bayesian approach has proved 
the most widely used. In certain areas the use of analog computers 
has been explored, but this work will not be reviewed here. 

In recent years, computer programs incorporating the Bayesian 
model have been developed for problems of heart disease (R7, R8), 
Thyroid disease (R9), epigastric pain (RIO), Cushing's syndrome (Rll) 
and others. Some of these programs have enjoyed striking success in 
attaining levels of performance comparable to that of the expert hu- 
man diagnosticians. For example, a Bayesian analysis of 268 cases of 
patients with one of three thyroid problems yielded the accepted diag- 
nosis in 96% of the cases. (R9). In a similar analysis of acquired 
valvular heart disease patients, a computer program correctly identi- 
fied 96% of the problems. (R7). In both cases this level of per- 
formance compares favorably with that attained by experienced diag- 
nosticians. 

In order to provide a more detailed view of the use of Bayesian 
analysis in computer-aided diagnosis, two studies will be reviewed 
here. The first is the diagnosis of congenital heart disease; and 
the second, the diagnosis of thyroid function. 

In a series of papers (R12, R13, R14), Warner, Toronto, and 
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Veasy have reported on the development and use of a computer program 
for the diagnosis of congenital heart disease. This program employs 
fifty-seven possible attributes to classify patients into thirty-five 
different disease classes. The basic strategy employed by the pro- 
gram is the use of Bayes' rule to obtain the posterior conditional 
probabilities for the different diseases given a particular set of 
attributes. The necessary a priori disease probabilities and condi- 
tional probabilities of attributes given disease were derived from 
statistical studies of a large number of known congenital heart di- 
sease patients. In certain instances, the statistical information so 
obtained was deemed inadequate and the probabilities involved were 
estimated from 1) the available literature and 2) consideration of 
the pathologic physiology of the disease. The program takes into 
account the significance of attributes which are absent as well as 
those which are present. Thus, the absence of cyanosis is significant 
in the diagnosis. The program is also designed to account for cer- 
tain mutually exclusive sets of attributes. For instance, if one 
of a set of mutually exclusive attributes is present, it would be in- 
correct to consider the absence of the other attributes in the set 
as additional information in the diagnosis. 

The program is used in the following way. For each patient ex- 
amined, the examining physician determines the presence of absence 
of the required attributes. When the examination has been completed, 
the information obtained is punched on cards and fed to the computer 
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in the field. Furthermore., the accuracy of the computer 
diagnosis is still improving with refinements in the data 
matrix. (R-12) 

Overall and Williams (R-9) developed a computer program for the 
diagnosis of thyroid function. The object was to classify patients 
into one of four classes: 1) no thyroid disease,, 2) hypothyroidism,, 
3) enthyroidism or 4) hyperthyroidism. By analyzing 879 cases., the 
authors obtained a disease-probability matrix which included 21 in- 
dices of thyroid function. Although over 800 cases were involved in 
the analysiSj not all of the 21 measures were available for each 
case. Relative frequencies of each attribute were based on the num- 
ber of cases in which the necessary data were available. Independence 
of attributes was assumed,, although the authors note that this assump- 
tion is suspect. 

In an extensive series of tests,, the program performed extremely 

well. According to the authors 

. . . computer diagnoses agreed with the clinical diagnoses 
in over 96% of the cases in which anything like complete 
data were available. (R-9) 

Both of these examples of computer-aided diagnoses lend credence to 

the belief that Bayesian attribute-disease models of diagnosis may 

prove extremely useful in a whole range of medical applications. 

As noted earlier^ not all applications of mathematical methods 

to medical diagnosis have been founded on Bayesian inference. An 

interesting example of a different view of the problem involves 

considering a point in an n ~ dimensional space (where n is the 

number of attributes) . From past experience with diseases^, one can 
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consider each disease as representable by a class of points in the 
space. The diagnosis of the current disease is derived from a 
consideration of the "closeness" of the corresponding point to the 
classes for each of the known diseases respectively,, 1 In a recent 
paper (R-7), Lerner discusses the use of such an approach in the 
recognition of handwritten letters and the detection of oil-bearing 
strata in petroleum geology. In the latter problem (another type 
of diagnostic problem), he reports that a program based on this 
method far surpassed the performance of the most experienced experts. 
He then advocates the application of this method to problems of 
medical diagnosis and asserts that the possibilities of this approach 
"considerably exceed those of doc tors -diagnosticians." 

While this method differs markedly from that employed in the 
two medical applications above, it shares with them a very important 
limitation. In Chapter 1 it was suggested that the diagnostician 
performs two major tasks in his problem- solving. The first task is 
the interpretation of attributes manifested by the system being diag- 
nosed. An equally important task is the selection of an appropriate 
testing strategy. All of the programs above map a set of attributes 
into a diagnosis in one stage. There is no test selection function 
performed in any of these programs. As a result, all the data which 
are to be employed by the program must be collected before the pro- 
gram is invoked. There is no opportunity for selective testing based 



■"-This approach will be examined in more detail in Section A of 
Chapter 3. 
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on an analysis of an incomplete set of attributes. Thus, it may 
happen that the cost of determining a number of attributes (for ex- 
ample, by taking an X-ray) is incurred unnecessarily. While this 
may not be a major problem in the particular areas discussed above , 
it is easy to think of situations in which this approach would be 
highly undesirable. Consider, for example, the computer-aided diag- 
nosis of diseases from a group which exhibit clusters of relatively 
disjoint attribute patterns. The approach outlined above required 
the determination of a full set of attributes to be made available 
to the program. Since only a small subset of the set of all attri- 
butes is necessary for a diagnosis, many attributes are unnecessary 
in any particular application. If the cost of obtaining these un- 
necessary attributes is high, then the diagnostic procedure will be 
less than satisfactory. This is because the quality of diagnosis 
should reflect its cost as well as its accuracy. As Lusted has ob- 
served (R-17), 

A great many medical diagnostic tests have been developed to 
supplement the patient information obtained from history 
and physical examination. These tests vary greatly in the 
amount of discomfort to the patient, complexity, and cost. 
It is obvious that diagnostic tests should be kept to a 
minimum. 

It seems that a more satisfactory solution is to permit the 

diagnostic program to operate sequentially, choosing tests for the 

user to run based on a continually updated view of the problem. 

The program could engage in a dialogue with the user as it performs 

both the inference and test selection functions of diagnosis. The 
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testing strategy evolved by the program should reflect the informa- 
tion derived from the attributes observed to date., past experience 
with similar systems., the cost of tests., and the relative seriousness 
of various disease states. Part of the research reported in this 
thesis is aimed at developing a program which satisfies these require- 
ments. 

Less has been done with computer-aided diagnosis in other areas. 
One problem which has received attention, however., is the diagnosis 
of faults in a computer. Although the problems here are not well 
understood at present, recent research (R-18) shows considerable 
promise. Significant results pertaining to the selection of an op- 
timal set of diagnostic tests have been obtained (R-19),, but they are 
restricted to the case of a single fault. 

B. Perspectives on Diagnosis 

One of the chief motivations for this research is belief that 
a computer is potentially a very useful tool to be employed in di- 
agnostic problems. The need for such a tool becomes apparent when 
the difficulty of particular diagnostic problems is considered. 

A considerable portion of the effort expended in implementing 
computer programs is devoted to program debugging. As programming 
applications become increasingly sophisticated, the complexity of 
the associated problems of debugging increases at an equally rapid 
rate. The tremendous effort required to debug a large operating 
system is a testament to the magnitude of the diagnostic problem in- 
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involved. This is so even though many of the programmers involved 
in such an effort are experts. 

The non-expert who ventures into the world of programming 
also faces many diagnostic problems. Often the magnitude of these 
problems relative to his limited programming skill and experience 
is such as to prevent him from effectively using the computer in 
his particular research. In both these cases, there is a need for 
an improved diagnostic facility. Research into the potential use- 
fulness of diagnostic computer programs seems especially appropri- 
ate in this context. 

Much the same situation exists in medicine, although here 
there exists more explicit evidence of difficulty of problems in 
medical diagnosis and the need for new aids in the problem-solving 
process. Physicians receive extensive training in their profession, 
and they devote considerable efforts to the development of their 
diagnostic acumen. For all their training, however, the difficulties 
of the diagnostic problems confronting them have resulted in a sur- 
prisingly low level of performance. In a recent research report of 
the United States Public Health Service entitled "Completeness and 
Reliability of Diagnosis in Therapeutic Practice, " the author con- 
cludes from an extensive study 

On the basis of available evidence, I estimate if we re- 
gard all diagnosable diseases at a given time that are con- 
sidered of significance for current health as 1, the num- 
ber of therapeutically determined diseases constitute 
numerically 0.4. Of this o 4 nearly half are conditions 
diagnosed incorrectly, This suggests that correctly 



Chapter 3 
TWO VIEWS OF DIAGNOSIS 

This chapter concerns the theoretical framework for the 
study of computer-aided diagnosis. Here the nature of the diag- 
nostic problem is examined and the model for the problem is de- 
veloped. Two views of diagnosis are considered. The first view 
is that of diagnosis as a pattern recognition problem. This con- 
sideration brings into focus those features of the diagnosis which 
distinguish it from the "classical" pattern recognition problem. 
The second view involves analyzing diagnosis as a problem in sequen- 
tial decision-making. The problems arising from this formulation 
are explained and various means of circumventing these problems 
are discussed. The view of diagnosis as sequential decision- 
making is the one taken for this research and so this discussion 
leads directly to the specification of a computer program for per- 
forming general diagnosis. 

In the following chapter 3 a discussion of a program to perform 
general diagnosis is presented within the framework of the program 
actually implemented as part of this research. Each of the major 
logical functions of the program is discussed in turn with the em- 
phasis on the way in which these functions match the requirements 
of a diagnostic process. In a very real sense, the program can be 
taken as a statement of an overall diagnostic strategy for computer- 
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aided diagnosis. 

A. DIAGNOSIS AS A PROBLEM IN PATTERN RECOGNITION 
Consideration of the diagnostic problem as a pattern recog- 
nition focuses attention on some of the more significant aspects 
of the problem. Also it is quite natural to conceive of diagnosis 
as a pattern recognition problem. The observable attributes associ- 
ated with the system of interest in a diagnostic problem do consti- 
tute a pattern which is the direct evidence upon which a classifi- 
cation decision is based. Thus a medical diagnostician confronted 
with an ailing paitent employs his observations of the patient's 
symptoms and signs in conjunction with his experience and training 
to deduce the nature of the patient's problem. While there are 
many features which are shared by the diagnostic problem and a wide 
variety of particular pattern recognition problems, there are addi- 
tional constraints on the former which add to its complexity. The 
purpose here is to explore both the similarities and differences 
between the diagnostic problem and the "classical" pattern recog- 
nition problem. 

The classical pattern recognition problem is fundamentally one 
of recognizing class membership and establishing decision criteria 
for measuring membership In each class. Given a set of pattern 
classes the problem is to assign a new pattern to one of the classes. 
For example in the recognition of handwriting, knowledge of the gen- 
eral properties of individual letters is utilized in the determina- 
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tion of the identity of that segment of handwriting which is cur- 
rently of interest. The individual pattern classes may be known 
in a variety of ways ranging from a set of representative patterns 
to a functional characterization of the probabilistic process by 
which patterns of the class are generated. In general, a pattern is 
comprised of a set of features; each feature being represented by 
some numerical value. In the handwriting recognition problem, an 
unknown letter could be represented by numerical values for such fea- 
tures as the height, number of loops and the number of intersections 
the letter makes with certain reference lines. Such a representa- 
tion leads quite naturally to the representation of a pattern as an 
n - dimensional vector where n is the number of features which are 
taken to be relevant to the classification problem. 

Hence, each pattern class can be conceived of as a set of 
points in an n - dimensional space. Similarly, any pattern which 
is to be classified can be represented as a point in the space 
(provided, of course, the same set of features obtains). The problem 
of classifying a new pattern sample involves determining the "close- 
ness" of the sample to each of the respective classes. For instance, 
we may decide a certain letter is an "e" because it more closely re- 
sembles representatives of the class of known "e's" than representa- 
tives of other classes of letters. In the n - dimensional space, 
this corresponds to measuring the distance (in some abstract sense) 
between the point denoting the new pattern and those representative of 
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the various classes. The problem of establishing criteria upon which 
the "resemblance" of a particular letter to the class of letters 
known to be "e's" is but one instance of the general problem of de- 
ciding exactly how the "closeness" of a sample to various classes is 
to be established. For a given application, the determination of an 
appropriate metric is a fundamental problem of pattern recognition. 

Consider the schematic of a pattern recognition problem pre- 
sented in Figure 1. Here two pattern classes are of interest, 
classes A and B. In this case, there are two features in the pat- 
terns and an orthogonal coordinate system corresponding to these fea- 
tures is shown. Notice that in this simple example all members of 
class A are "closer" to all other members of class A than to any 
member of class B and v ice versa . Unfortunately, this condition 
does not hold in general „ The more common case is to have "close" of 
intersecting pattern classes. Members of a class can be closer to 
members of another class than to certain other members of the same 
class. For example, some handwritten "e's" look very much like "i's" 
and vice versa . A schematic of intersecting pattern classes is pre- 
sented in Figure 2, The problem of recognizing the pattern x in 
these figures involves establishing a metric which can be employed to 
decide whether x is "closer" to the class A or the class B (or in 
some cases deciding that x is a member of neither A nor B). The 
actual decision regarding the identity of x can be based on the cost 
of misclassification as well as the chosen metric. 
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Figure 1 
Two Pattern Classes 
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Figure 2 
Intersecting Pattern Classes 
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When the pattern classes are inherently close or intersecting 
in the space, recognition is more difficult. In some cases matters 
can be improved by devising class separating transformations. Such 
a transformation has the property that the classes resulting from 
the application of the transformation to the original classes are 
more separated from one another in the transform space. Figure 3 
represents the effect of a class-separating transformation on classes 
A and B. The particular transformation will depend on both the 
characteristics of the classes to be transformed and the constraints 
placed upon the transformation. Suffice it to say here that trans- 
formations of this type can be derived by solving constrained op- 
timization problems. Given such a transformation, the pattern to 
be recognized is first transformed and then its "distance" from 
each of the transformed classes is measured. It is this distance 
in the transform space which is incorporated in the classification 
decision rule. 

The problem of diagnosis has much in common with the pattern 
recognition problem discussed above. The pattern classes in the 
pattern recognition problem correspond to the system states in the 
diagnostic problem, and there is a similar analogy between particular 
patterns and sets of attributes. The object of diagnosis is to class- 
ify a set of attributes as being a manifestation of a particular sys- 
tem state. Again, the notions of an n - dimensional space and vector 
representations of attribute sets is suggested. There is an important 
difference between diagnosis and the pattern recognition method out- 
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lined above. In the latter, it was assumed that a pattern to be 
recognized is given as a point in the sample space. This implies 
a complete specification of the corresponding vector. In the usual 
diagnostic problem, the pattern of attributes is incompletely speci- 
fied. There exist means for obtaining the values of unspecfied com- 
ponents of this vector (tests which can be run, etc.), but in general 
there is a cost associated with the use of these means. These costs 
make it advantageous to analyze the diagnostic problem sequentially 
and to make decisions based on an incompletely specified attribute 
vector. Doctors, for example, make diagnostic decisions without 
performing all possible tests on the patient. 

Thus, in the diagnostic problem, one is concerned throughout 
with subspaces of the sample space. The dimensionality of the sub- 
space which contains the pattern vector is reduced by obtaining pre- 
viously unspecified values for certain pattern features. In general, 
each value so obtained reduces the dimensionality of the subspace in 
which the point corresponding to the fully specified attribute set 
must lie. Because of the costs associated with the tests for particu- 
lar attributes, a good diagnostic scheme must include some means for 
assessing the expected value of a test in determining the class to 



Note that this distinction between pattern recognition techniques 
and diagnostic techniques is not a necessary one. Certain pattern 
recognition schemes have employed sequential methods while most medi- 
cal diagnosis programs have avoided sequential analysis entirely. The 
distinction, however, does have appreciable generality. 



31 



which the attribute vector belongs. While the sequential nature of 
the diagnostic process complicates its realization, it also offers 
a potential advantage of the pattern recognition scheme described 
above. Although an attribute vector may be incompletely specified, 
the subspace corresponding to it may include only one class. In 
such a case it may be possible to make the classification decision 
at that point without investigating the remaining attributes. This 
reduction in the amount of the processing required for a classifica- 
tion decision is especially significant when many of the system 
states are represented by disjoint sub space a in the n - dimensional 
sample space. This reduction can be obtained only if the diagnostic 
scheme incorporates some stopping rule for the attribute sampling 
(or testing) process. 

So while the pattern recognition problem and the diagnostic 
problem have a number of features in common, there are significant 
differences between the strategies indicated for their solution. The 
former problem concerns the classification of a fully-specified 
vector into one of a number of known classes. The latter problem 
is equally one of classification, but the initial specification of 
the vector is generally incomplete. Part of the problem is to as- 
certain which tests to run (at some cost) to obtain a more complete 
specification of the vector. Decisions based on an incompletely 
specified vector are the rule rather than the exception. Note, 
however, that there may well be inherently close or intersecting 
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classes in the diagnostic problem as in the pattern recognition 
problem. 

One aspect of the pattern recognition problem which was not 
discussed above was that of choosing the coordinate system for the 
sample space. This has a direct and significant analogy in diagnosis. 
In the discussion of pattern recognition, it was assumed that the 
pattern features were given. The efficiency and the accuracy of 
the recognition scheme often can be improved by the selection of a 
new coordinate system (set of features) . The problem of establish- 
ing the coordinate system is often termed the pattern detection 
problem. 

Thus, for example, in Figure 1 the dotted coordinates are in 
a sense more efficient, for they permit the characterization of 
classes A and B solely in terms of one coordinate. Again general 
mathematical techniques are known for establishing "good" coordi- 
nate systems for a number of problem types. 

Clearly, a similar situation obtains in diagnostic problems. 
Generally speaking, the attributes considered in diagnostic problems 
are chosen without any particular regard for the efficient separation 
of pattern classes. It is apparent, however, that there is potential 
value in conducting such an analysis for a given problem area. In 
certain areas, especially In a medical diagnosis, there has been an 
increasing awareness of the importance of the proper choice of pattern 
features; a number of articles on the "taxonomy of disease" 
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have appeared in the literature. 1 While this problem is an extremely 
interesting one, it is beyond the scope of this thesis. Here the 
pattern features of attributes for any particular area are taken as 
given. 

This discussion provided only a brief overview of pattern recog- 
nition and its relation to diagnosis. The particular type of pattern 
recognition which constitutes diagnosis will be explored in con- 
siderable detail in other sections of this work. 

B. DIAGNOSIS AS A SEQUENTIAL DECISION PROBLEM 
In this section, the problem of diagnosis is formulated in 
terms of statistical decision theory. This formulation is in very 
general terms, but it suggests a number of the factors which compli- 
cate particular diagnoses. In many areas of diagnosis, attention is 
focused on a system . In medicine the system is a human being; in 
program debugging, a computer program. The object of the diagnostic 
problem is to determine the state of the system (e.g. the disease in 
the person or the error in the computer program) . This state is one 
of a finite but perhaps quite large number of possible states. In- 
formation about the state of the system can be obtained by performing 
a variety of tests on the system . Information obtained from testing 



In recent years, there has been much medical work directed 
at developing specific tests for diseases. Thus a particular at- 
tribute (test result) may indicate exactly one disease. 
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coupled with experience with other diagnostic problems is employed 
by the diagnostician in his attempt to deduce the state of the sys- 
tem. In this work, the goal of diagnosis is taken to be the deter- 
mination of the state of the system of interest. It is assumed that 
knowledge of the system state will greatly facilitate further (non- 
diagnostic) action. For example, the identification of the state 
of a patient as "tuberculosis 11 may lead directly to a course of 
treatment. The s ystem under consideration here is a finite state 
machine. The diagnostician knows about all the states of the machine 
in the sense that he has available probability distributions which 
characterize the response of the machine to certain tests given the 
machine state. In particular, this information relates attributes, 
the results of the tests, to particular system states. At the 
outset of the problem, the machine is in a particular, but unknown 
state and the task of the diagnostician is to employ the available 
tests to obtain information about the identity of that state. Tests 
are assumed to be free from error and it is further assumed that they 
do not alter the state of the system. 

Associated with each test is a cost of applying it to the system 
(called the testing loss) and thus it is advantageous to make a de- 
cision about the state of the system based on a limited number of 
tests. On the other hand there is a decision loss associated with an 



l An attribute is binary-valued. That is, each attribute is 
either present or absent. A test is used to determine the presence 
or absence of some number (perhaps greater than one) of attributes. 



35 



incorrect decision. The loss resulting from each particular decision 
about the unknown state as a function of the actual state is given 
by a loss function for the problem,, For example, the loss resulting 
from the decision that a tumor is benign when it is in fact malignant 
is very costly and a diagnostic procedure for tumors should take 
cognizance of this fact a In general, the possibility of loss for 
an incorrect decision indicates the value of extensive testing prior 
to any decision. The problem is to balance the testing loss and the 
decision loss in a sequential decision function for the problem. 
This function would specify a diagnostic procedure such that the 
total expected loss of the final decision is minimized. The follow- 
ing is a formal statement of this problem. 

1. The states of the Machine M are M- j=l^n. 

and the current state is denoted by My. It is assumed 
that My does not change during the course of the 
diagnosis. 

2. if - (Tfij, " - ~7T n ) is a vector of a priori probabili- 
ties for My. That is "if ± = P^ - M i /^') 

and £, denotes experience. 

3. T = ]t lj( - - - t r i is the set of available tests D 

4. (t^)q is a vector of length q with each t^eT. It repre- 
sents a series of tests with test t. being run at the i— 
stage. 
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5. S = Ts -]_ j -- S^l is the finite set of possible attributes 
for M and the set T. 

6. (S^) q is a vector of length q with each S^S. It denotes a 
sequential set of attributes. 

7. d t is a terminal decision and d t £ D t where D t is the finite 
set of all possible terminal decisions. 

8. C((t^) (^i)n) i- s the testing loss for a sequence of tests 
(t i ) resulting in the attribute sequence (S^) followed by 
terminal decision at stage q+1. 

9. PCCS-j^) /M-) is the conditional mass function for (S^) 
given M • . 

10. P((t i ) qj ,d t /(S i ) ) = conditional mass function for the testing 
sequence (t^) q followed by terminal decision d t given the 
attribute sequence ( s ^) q . 

11. L(7Tjd ) is the decision loss function. 

12. 9(d/(t.) (S.) ) is the sequential decision function to be 
determined. 

Let L,CT|~,9) = the average decision loss 
L„(fl" .,9) = the average testing loss, 
then the problem is to determine 9 such that 

L^TT ,9) + L 2 (71\9) 

is a minimum. 

H( ,9) = 211 ?V XIUTT J ci t )9(d t: /(s i ) (t t ) ) . P( (Si ) q / Kj ) 

1=0 T q HiVt 
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L 2( 1,8) = £ Z. Z'^jZ ^(t^q^t/^Pq) • CCCS^Ct^PKS^q/fcj) 

q=0 t q j^T ^ 

where T Q is the set of all (t^) a 
and S is the set of all (S ± ) 

The great difficulty with this problem is not conceptual but com- 
putational. For finite sets of attributes and decisions_, the optimal 
solution can be obtained in principle by laying out a decision tree. 
Such a tree includes by two types of nodes—decision nodes and "nature's 
nodes." Nodes of the former type are characterized by 1) a current 
view of the diagnostic problem as embodied in the probability dis- 
tribution over the states of the system. (This distribution accounts 
for both the attributes observed to date and the a priori likelihood 
of system states in a manner to be made explicit later in this thesis. ), 
and 2) a branch emanating from the node for each alternative available 
to the decision-maker at the node. In the context of diagnosiSj theUj 
there is at each decision node one branch for each possible test which 
can be run and one branch corresponding to a terminal decision. Once 
an alternative branch away from a decision node has been chosen by 
the decision a particular one of nature's nodes is encountered. 

Such a node represents the possible outcomes of the decision cor- 
responding to the branch which leads to the node. Each of these "out- 
come branches" leads to a new decision node. A portion of such a 
decision tree is shown in Figure 4. The node A is a decision node 
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Figure 4 
Section of a Decision Tree 
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which is characterized by the prior probability distribution and his- 
tory embodied in the path to the node. There is a branch from this 
node for every relevant test (given the history andTT) as well as a 
branch corresponding to a terminal decision. If a particular test is 
chosen, say test T^ in the diagram, a new node (here node B) is ob- 
tained. This node is one of the "nature's nodes" mentioned above. 
There is a branch from this node for each possible test outcome given 
T]_ and given the state of the diagnosis at B, the conditional proba- 
bility for each attribute branch can be computed. 

If it is assumed that the total number of potentially useful 
test sequences Is finite then the entire tree for the diagnosis can 
be specified. By folding back this tree in terms of expected loss, 
one can obtain an optimal decision for every decision node on the 
tree. This problem is amenable to techniques such as dynamic pro- 
gramming. There is little conceptual difficulty in solving the 
problem. 

The difficulty is the exponential growth of the number of de- 
cision nodes with the number of signs and tests. Since diagnostic 
problems involving large numbers of possible attributes are common, 
it is expected that the problems of searching large decision trees 
contribute a large part of the complexity of specific diagnostic 
problems. One of the major concerns of this research is with the 
development of effective heuristics for this tree searching problem. 
While such heuristics produce sub-optimal solutions, it is possible 
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that the reduction in the size of the search space may more than 
offset this disadvantage. 

As an indication of the potential size of such a problem, con- 
sider the diagnosis of a ten-state, twenty-attribute system. Such 
a case might arise when one was attempting to employ twenty attrib- 
utes to classify a person into one of ten disease groups. Assuming 
that there is a test for the presence or absence of each attribute 
and that each test is run but once, the number of decision nodes in 
the decision tree for the problem can be expressed as 

n N k = J^L 
n (n-k).' 

Where n N k = the number of decision nodes 

k = the depth of the tree 

n =» the number of tests. 
For this example, n is 10, and the number of decision nodes in a 
tree of depth k is given by 

10 N k = 2 k 10,' 

iU K (io-k): 

Table 1 gives values of 1Q N k for selected values of k. Notice the 
extremely rapid increase of 10 N k with k . Also, at any given decision 
node at depth k it is necessary to compare (n-k+1) decisions (one for 
each of the n-k remaining tests and one for the possible terminal 
decision). Although in many cases such an attribute set is highly 
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redundant j it is often possible that a depth of 5 may be required 
for an optimal decision. In such a case there are still almost a 
million decision nodes. Even in the simple case of a specific test 
for each state ,, there are n. 1 different decision nodes, where n is 
the number of states. Again the growth of the decision tree with 
n is enormous. 
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While there are certain factors in particular diagnostic 
areas which allow the decision tree to be considerably reduced in 
size, the determination of an optimal testing strategy reamins com- 
putationally infeasible for the most part. The value of good 
heuristics is apparent from considerations such as the above. 
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C. HEURISTIC CONSIDERATIONS IN TEST SELECTION 
As previously noted, the problem of obtaining an optimal test- 
ing strategy for a particular diagnostic area generally will be 
computationally infeasible. Many diagnostic areas are character- 
ized by overlapping attribute patterns for different states and highly 
redundant attribute patterns, however , and there is strong motivation 
for developing "good" diagnostic strategies. Unnecessary or re- 
dundant tests may exact a high cost which could be avoided by a more 
efficient testing strategy. In certain areas of medicine, tests 
are quite costly and may cause the patient considerable discomfort. 
If such tests contribute little additional information to the 
diagnosis, it is especially important that these tests not be em- 
ployed. A second difficulty is that a poor sequence of tests may 
generate results which, being unnecessary for a diagnosis, simply 
tend to obscure the truly relevant attributes. One approach to 
this problem was mentioned earlier. This approach consists essen- 
tially of sharpening the taxonomy of the problem states. While 
success here can substantially reduce the redundancy in attribute 
pattern^., it will not necessarily make the determination of an op- 
timal testing strategy computationally feasible. While the possi- 
bilities of this approach are extremely interesting, they will not 
be considered here. For the purposes of this work, it is assumed 
that in any diagnostic area, the attributes for states are given. 
No attempt is made to improve on the efficiency of the given attrib- 
utes with regard to the characterization of the states „ 
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A second approach to the problem of test selection is to de- 
velop heuristics for the selection process. Such heuristics would 
employ only limited segments of the decision tree in evaluating 
the potential efficacy of relevant tests. The general nature of 
the diagnostic problem is such as to offer two distinct means of 
controlling the growth of the number of decision nodes considered. 
The size of the decision tree (the number of decision nodes) de- 
pends on the number of tests considered at any decision node, and 
the depth of the analysis of that tree. By restricting either of 
these quantities, the diagnostician can limit the growth of the 
tree. In this discussion, heuristics which limit the number of 
branches from a decision node will be called breadth - limiti ng ; 
and those which limit look-ahead, depth - limiting . In what follows, 
the set of relevant tests for a particular decision node will be 
taken to mean all those tests which can result in a sign which is 
manifested by at least one state with a non-zero probability in 
the prior for the node. The set of relevant tests is a subset of 
the set of all tests. 

Breadth- limiting heuristics are easily formulated,, Perhaps 
the simplest is to limit the number of branches from a decision 
node to some fixed number. If this number is less than the number 
of possible test branches for a given node, then a decision rule 
for selecting (or rejecting) branches must be established. In terms 
of the diagnostic problem, this means selecting a subset of the 
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relevant tests for consideration given a prior distribution for 
the unknown state. 

Heuristics which limit the number of branches from a decision 
node to a certain fixed number have several shortcomings. Principal 
among these is the problem of the selection decision rule. If 
certain tests are to be selected over other tests, then some measure 
of test effectiveness should be employed. That is, one test is 
chosen over another because by some standards the former is more 
promising. The difficulty with this is that almost any reasonable 
measure of expected test effectiveness requires information obtained 
from a look-ahead in the decision tree. To assess the potential 
value of a particular test, one needs to consider the likelihood 
of various test results and the value of these results in improving 
the current view of the diagnostic problem. If this look-ahead is 
performed, the purpose of the heuristic is defeated, A breadth- 
limiting heuristic is intended to select a subset of relevant tests 
without employing a look-ahead procedure. Then this subset is 
subjected to further analysis. 

Since a breadth-limiting heuristic probably should not employ 
a look-ahead to obtain information, the only information upon which 
it should make its decisions is that contained in the current prior 
distribution and the test cost data. Thus one possible breadth- 



This may be overly restrictive, since one can imagine breadth- 
limiting heuristics which employ a priori probabilities. Such heuris- 
tics are not in general very sophisticated, and are not considered here 
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limiting heuristic is "At any decision node consider at most 5 tests 
in order of increasing cost." This heuristic obviously ignores all 
the information embodied in the current prior distribution, and so 
while it limits the breadth of the decision tree, it does not appear 
to be a particularly good heuristic. 

An alternative breadth- limiting heuristic employs the current 
prior distribution to generate the subset of relevant tests which are 
to be considered. For each state there are a number of relevant 
tests. These tests may produce an attribute which is significant in 
the diagnosis of the state. Consider, for example, a problem in 
medical diagnosis in which one of the diseases which currently is 
being considered as the explanation of the patient's problem is 
tuberculosis. Since a chest X-ray is a useful test in the diagnosis 
of this disease, it would be considered a relevant test. On the 
other hand, the absence of any attributes associated with an in- 
jured ankle would exclude an X-ray of the ankle from the set of rele- 
vant tests at this stage in the diagnosis. The union of the sets of 
tests relevant to currently possible states is the set of all rele- 
vant tests. By limiting the number of states considered, one can 
limit the number of branches at the decision node. A heuristic of 
this type is "Create the total set of relevant tests from the sets 
of relevant tests for the three most probable states (based on the 
current prior)." In the above example, if tuberculosis were cur- 
rently the most probable disease, the diagnostician might choose to 
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consider only those tests which are relevant to tuberculosis and 
ignore all others. Note that such a heuristic is only potentially 
breadth- limiting. There is no guarantee that any test branches are 
excluded in this way since the same set of tests may be relevant to 
all states currently being considered. Also the actual number of 
branches from a given decision node is not specified and generally 
will vary from node to node. 

Such an heuristic has intuitive appeal, however, because it 
prunes branches corresponding to tests for attributes specific to 
improbable states. If an attribute for an improbable state is also 
manifested by a state which is currently quite likely, however, then 
the appropriate test will be included in the set of those considered. 
The weakness of this heuristic lies in its sensitivity to the current 
probability distribution on the states of the system. This distribu- 
tion can undergo radical change upon the observation of one new 
attribute. Thus, states which were previously unlikely can become 
very probable as a result of one new observation. This phenomenon 
cannot be accounted for by breadth- limiting heuristics based on the 
current prior distribution. In fact, no breadth-limiting heuristic 
which does not employ look-ahead can completely account for this possi- 
bility. A breadth-limiting heuristic of this type is applied at each 
decision level, however, and in some sense it can "recover" from a 
drastic change in the probability distribution. This capability is 
derived from the consideration of the probability distribution at the 
current decision node. Thus, when a state which was formerly improbable 
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at one decision node becomes probable, it will automatically be in- 
corporated in the test selection scheme at the next level. Unfor- 
tunately, this state may not become very likely until a large number 
of tests have been run. If it is the actual state, its probability 
can remain low simply because the "wrong" tests are being run. Thus 
a doctor may fail to obtain a chest X-ray of a patient because it seems 
unlikely that the patient has tuberculosis, when this disease would 
become very probable if only the X-ray were taken. This, of course, 
is a general problem encountered with all test selection heuristics. 

The evaluation of the heuristic involves a comparison of the 
benefits of its tree-pruning power with the losses incurred from the 
sub-optimal testing strategies it produces. In general, a heuristic 
based on the current probability of various system states appears to 
be the most promising form of a breadth- limiting heuristic, but its 
actual value can be determined only in the context of a particular 
diagnostic problem area. For example, in one area a breadth-limiting 
heuristic which restricts the search to tests relevant to the n most 
probable states may prove useful. In another area, tests relevant 
to all states with current probability greater than some threshold 
may be considered. Finally, in certain areas breadth- limiting heur- 
istics may be of no value regardless of the particular specification. 
One of the areas explored in this research Is that of evaluating sev- 
eral breadth- limiting heuristics in particular diagnostic problem 
areas. In such an evaluation, the capability of closed diagnosis may 
be particularly valuable. 
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As noted in the beginning of this section, there are two gen- 
eral types of heuristics which reduce the number of decision nodes 
considered in test selection: breadth- limiting and depth- limiting. 
As the name of the latter implies, such heuristics limit the extent 
of the look-ahead in the decision process for test selection. As 
with breath-limiting heuristics, there are several variations of 
the depth- limiting heuristic to be considered. 

Perhaps the most obvious form of the depth- limiting heuristic 
is one which sets a fixed depth of search for all branches of the 
tree. Thus given a particular decision node^ the search would pro- 
ceed down all branches from that node to a depth k, where k is a 
f.ixed number. The information derived from this search would then 
be employed in a decision rule to determine the test to be run next. 
The parameter k is a relative depth, that is at a decision node at 
level 2.3 tne search is conducted to a depth of p+k before making 
the decision for level p_. An alternative depth- limiting heuristic 
might employ a variable depth look-ahead. Such a heuristic might 
attempt to explore more "promising" branches to a greater depth than 
less promising ones. The difficulty here is to decide which branches 
are promising. It is, in fact, the general problem of heuristic 
test selection all over again . 

There are several problems to be resolved in the development of 
any depth- limiting heuristic. First consider the effect on the de- 
cision process of limiting the depth of search. If the depth is 
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limited to k, then the "terminal" nodes will be characterized by 
probability distributions for the unknown state. (See Figure 4„) 
Since, in general, there will be a number of states with non-zero 
probability at any given terminal node, there must be some way of 
assessing the value of being at the node. One of the major problems 
in the development of depth- limiting heuristics then is the defini- 
tion of measures of the desirability of nodes which do not represent 
a certain diagnosis. 

One way of establishing the value of a node is suggested by 
the presence of a loss function. The value of the node can be ob- 
tained by assuming a decision about the unknown state is to be made 
there. Then the prior distribution for the node and the loss func- 
tion can be employed to find the expected decision loss for the 
node,, 1 From this loss the value of the node is derived. While 
this measure seems to be a natural one, it is not without its weak- 
ness. The problem with the measure is that it is based on an 
assumption which is generally untrue. In most cases, one will not 
make terminal decisions at the nodes which are "terminal" for one 
state in the look-ahead. For example, if the search depth is 
limited to 2, the value measure assumes that a terminal decision 
will be made two tests from this point. Since the actual terminal 



i An additional assumption should be noted here. This is the 
assumption that given the prior distribution, the minimum expected 
loss decision is made. 
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decision may not be made until many tests have been run, this 
measure distorts the value of tests considered for the current level. 
The problem is that the values of the loss function at the decision 
nodes of a given level may bear little relation to the values of 
the best testing strategies which include these nodes. The poten- 
tial effectiveness of this "loss function" measure is difficult to 
assess. The expectation is that it depends upon the particular 
problem area in which the measure is employed. 

A second problem with this heuristic is its potential sensi- 
tivity to the actual loss function employed. If the heuristic is 
very sensitive to the loss function then uncertainties as to the 
true nature of this function may result in testing strategies which 
are decidedly sub-optimal. The problems of accurately assessing 
the loss function for a particular application will be discussed 
later in this thesis. 

The above discussion of breadth-limiting and depth-limiting 
heuristics purposely considered the two independently in order to 
make clear the considerations involved. The motivation for such 
heuristics in test selection is the desirability of reducing the 
number of decision nodes considered. Since the number of decision 
nodes is dependent on both the breadth and depth of the search, 
the heuristics employed in an actual problem will interact. Gen- 
erally speaking, the depth of the search can be increased only at 
the expense of the breadth, because there is a constraint on the 
total number of nodes to be considered,, The particular balance of 
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these two heuristics may significantly affect the effectiveness of 
the test selection process. An additional complication is introduced 
by the possibility of changing this balance during the course of the 
diagnosis when many states are possible. It may be desirable to 
limit the depth and allow full breadth. This is particularly true 
if the prior distribution is quite diffuse. As the diagnosis pro- 
gresses and certain states are eliminated from further consideration 
the breadth of the tree may be reduced and the depth of search may 
be increased correspondingly. The relation between the depth and 
the breadth of the search is an important matter for investigation 
in the development of heuristic test selection schemes. 

More of the practical considerations involved in developing 
heuristics will be discussed in a later section describing the 
heuristics employed by the diagnostic program and their relative 
effectiveness. 



Chapter 4 
A DIAGNOSTIC SYSTEM 

The considerations outlined in the previous chapter led to 
the design and implementation of a diagnostic system. This system 
is composed of three major parts. The first is a set of programs 
which perform the actual diagnostic function. The second is a set 
of programs which facilitate the study of a variety of diagnostic 
problems and strategies. The third part of the system is the informa- 
tion structure which contains all the relevant information which 
these programs employ in performing diagnosis for a given problem 
area. While the content and, to some extent, the nature of the 
information structure vary with the particular application, it is 
convenient to consider this structure as a third general part of the 
diagnostic system. These three aspects of the diagnostic system will 
be discussed in detail in this chapter. 

The diagnostic system is currently operating on the Project 
MAC time-sharing system at the Massachusetts Institute of Technology. 
The diagnostic system is designed to exploit the inter-active capabili- 
ties of the time-sharing system. The programs of the diagnostic 
system are written In MAD and FAP. They make very extensive use of 
the SLIP-MAD system developed by Professor Joseph Weizenbaum of M.I.T. 
The SLIP-MAD system (hereafter referred to as SLIP) is a set of list 
processing functions embedded in the host language MAD. Because 
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discussions of SLIP are available elsewhere (R-20), only a brief out- 
line of the system is given here. 

The basic data structure employed in the SLIP system is a SLIP 
list . A SLIP list is a list composed of cells where a cell Is a pair 
of adjacent words of storage. The first word of the pair is divided 
into an identifier field, a link-left field and a link-right field. 
Each cell in a SLIP list contains a forward (right) link and a backward 
(left) link. SLIP lists are symmetric in the sense that lists have 
no particular orientation, the top and bottom of a list are equally 
accessible. The identifier is used to indicate the type of element 
stored in the second word of the cell. This element is referred to 
as the datum . An example of a simple SLIP list is given in Figure 5. 
Notice that any cell may contain an actual datum rather than a symbolic 
designation for the datum. 

Every SLIP list contains a special cell known as the header 
of the list. This cell contains the address of the first cell on 
the list in its right-link field and the address of the last cell on 
the list in its left-link field. Any storage location which contains 
the address of a list header in both its address and decrement fields 
is said to contain the name of that list. A SLIP list structure can 
be defined as a SLIP list whose data terms may themselves be names 
of SLIP lists. 

There may be associated with any SLIP list a description list 
or DLIST. If a SLIP list possesses a DLIST, the address of the header 
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Figure 5 
A Staple Slip List 
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of the DLIST is contained in the left-link of the datum of the 
header cell. The DLIST, which is itself a SLIP list, is used to 
store data pairs. A variety of SLIP functions are available for 
creating and accessing these pairs. 

The SLIP library is a set of functions for manipulating SLIP 
lists. Typical functions permit the reading or searching of lists, 
additions to or deletions from lists, and the creation or erasure 
of lists. SLIP maintains an available space list, and the system 
includes an automatic garbage collection facility. 

Because the SLIP library consists of compiled subroutines 
which can be invoked from MAD or FAP programs, SLIP programs run 
at object speed. The fact that SLIP is embedded in an algebraic 
language, MAD, means the full arithmetic and logical capability 
of the latter is available to the programmer in a list-processing 
application. These two features make SLIP a convenient language to 
use in the implementation and debugging of a large list-processing 
application such as the diagnostic system developed in this research. 
For this particular application, the need for both the flexibility 
of list-processing and the algebraic power of MAD is well served by 
the SLIP -MAD system. 

A. THE INFORMATION STRUCTURE FOR THE DIAGNOSTIC SYSTEM 
The manner in which the information relevant to a particular 
diagnostic problem area is organized has a considerable effect on 
the capabilities of the diagnostic program. The information contained 
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in this structure for a particular application constitutes the "ex- 
perience" which the diagnostic program brings to bear on problems. 
This experience includes relationships between observable attributes and 
states of the system to be diagnosed. For example, in an area of 
medical diagnosis, the information structure would contain the re- 
lationships between signs and symptoms and the appropriate diseases. 
Also included in the structure is information about the tests which 
are relevant to the given diagnostic area and their associated costs. 
Because of the probabilistic nature of many of the attribute-state 
relationships as well as other important relationships, the informa- 
tion structure must maintain a large number of individual probabilities. 
The general content of the information structure will be explained 
below. 

The large number of state, attributes, and tests encountered 
in many diagnostic areas places a premium on efficient searching of 
the information base during a diagnosis. The efficiency of search 
can be maintained at an acceptable level only through the proper organi- 
zation of the relevant information. 

A number of questions were considered in the design of the 
information structure currently employed by the diagnostic system. 
One of the principal questions was that of what information should 
be maintained in the structure. To a large extent, the particular 
diagnostic problem under investigation here determined the answer to 
this question. Since the model of diagnosis makes reference only to 
states, attributes, tests and various probabilities, these factors 
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constitute the basic information blocks in the structure. Another 
question is how, given the basic information blocks, these blocks 
should be related in order to facilitate access by the diagnostic 
program to the relationships which are significant in the deductive 
process of diagnosis. For example, the following questions typify 
the types of demands made on the structure. 

■ What are the symptoms of pneumonia? 

• Which diseases exhibit a rash on the arms as an 
attribute? 

• What is the probability that a patient will have a 

e 

temperature greater than 103 given that he has 
pneumonia? 

The information structure described here was developed through 
the consideration of a number of alternative forms, although there 
obviously are other forms which might serve as well. To a certain 
extent, the information structure reflects the use of the SLIP 
system by the diagnostic program. For example, the information 
structure is a SLIP list structure. While in certain instances 
this results in inefficient utilization of main storage, this dis- 
advantage was more than offset by the convenience of being able to 
employ the full SLIP library in the development of the diagnostic 
system. 

A basic information block in the structure is either a state, 
an attribute, or a test. Each of these basic blocks is represented 
by a SLIP list in the information structure. In what follows these 
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blocks will be referred to as state lists , attribute lists, or test 
lists . A typical state list is depicted in Figure 6; in this instance,, 
the state list corresponding to pneumonia in a medical diagnosis 
problem. The list name of each attribute list relevant to pneumonia 
appears on the state list for this disease. There are two data pairs 
on the DLIST of each state list. The stored attributes are the a 
priori probability of the state and the print name of the state. The 
latter is the name by which the user of the program makes reference 
to the state. In order to facilitate the retrieval of the state list 
corresponding to a particular print name (as^ for example } when the 
user makes a request for information about the disease pneumonia) } 
all the state lists are grouped on a number of hash lists. Each hash 
list is a sublist of a list called the master state list . The re- 
trieval of the state list corresponding to a particular print name is 
effected as follows: First a SLIP function is used to map the given 
print name onto the integers to N-l., where N is the number of hash 
lists on the master state list. If the integer K-l results from this 
mapping^ the K th hash list is searched for a state list with the de- 
sired print name. Since the same hashing function is employed in the 
creation of the master state list., the appropriate list will be found 
if one exists. Roughly speaking,, this technique reduces the average 
search time for such requests by a factor of 1/N as compared to a 
search in the absence of hash lists. 

An attribute list includes the list names of all the test lists 
corresponding to tests which can result in the given attribute. 
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Figure 6 
A Sample State List 
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The DLIST for an attribute list contains a data pair for the attrib- 
ute print name in addition to a special data pair for a member list . 
The member list for an attribute list is a standard SLIP list which 
contains the list name of each state list on which the name of the 
attribute list appears and the corresponding probability of the at- 
tribute given the state. Continuing the example above. Figure 7 de- 
picts the attribute list for the attribute "fever." As in the case 
of the state lists, each attribute list is a sublist of a hash list, 
and each of these hash lists, in turn, is a sublist of the master 
attribute list 

A test list contains the cost of the test and a DLIST. The 
DLIST contains the print name for the test and a member list for the 
attribute lists which include this test. In Figure 8 a simple test 
list is shown with a single cost (independent of state) and a deter- 
ministic member list. This is the form of test list used in this 
research although it would be relatively easy to make it more com- 
plex. As above, each test list is a sublist of a hash list, which 
is in turn a sublist of the master test list. A schematic of a por- 
tion of the information structure is shown in Figure 9„ 

The presence of two-way links between attributes and states and 
attributes and tests results in a highly associative information 
structure. This associative property facilitates the accessing of 
information pertinent to a diagnosis. Thus a search for attributes 
given state and a search for states given attribute are equally effici- 
ent. Similarly the accessing of possible attributes resulting from a 
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particular test is made straightforward by the presence of the member 
list. 

One example of the importance of this associative aspect of the 
information structure is its use by the diagnostic program in the 
initial "pruning" of the space of possible diagnoses in response to 
the observation of initial attributes. Generally, these initial 
attributes are presented as the user ' s statement of the problem. 
For the program to operate in a reasonably efficient manner, it must 
use this initial statement of the problem to develop a drastically 
reduced set of states for further consideration. This is directly 
analogous to the "pruning" employed by a doctor when upon the observa- 
tion of a few initial signs or symptoms, he reduces the list of di- 
seases he considers as possible causes of the problem to a very small 
number relative to the set of all diseases. The diagnostic program 
would employ the member list for a given attribute list to rapidly 
determine the set of all diseases which were known to exhibit the 
corresponding attribute. While this reduction of the search space 
is crucial to the success of the program, it must not be irreversible 
if the program is not to be led astray by spurious information or 
noise. Since it is unreasonable to expect that those who prepare the 
information structure can anticipate all variations in attribute pat- 
terns for a given state, it is expected that the program at times will 
be confronted with problems involving attributes which are not rele- 
vant to the principal problem. The strategies employed by the pro- 



65 



gram and the nature of the information structure have a strong effect 
on the program capability in such a problem environment. 

The information structure currently employed by the diagnostic 
program associates with each state only those attributes which are 
relevant in the diagnosis of that state. Thus there would be no 
association between the state "tuberculosis" and the attribute "sore 
thumb 11 in the information structure for medicine. The advantage of 
this is that the size of the information structure is limited. Thus 
while there may be many attributes, only a subset is associated with 
any state. As will be discussed later a this creates problems in 
performing diagnosis in a noisy environment. Certain routines asso- 
ciated with the diagnostic program are responsible for making de- 
cisions about the significance of the attributes observed in a diag- 
nosis. The function of these routines is also the subject of a 
later section. 

The discussion of the information structure to this point has 
implied that the attributes for a given state are taken to be in- 
dependent. Since in many cases the assumption of attribute independence 
is not justified, it is necessary that inter -attribute dependencies 
be representable in the structure. This capability is available in 
The current program through the use of clustering routine . the 



Since the program does not determine what information is in- 
cluded in the structure, the user can associate any attributes and 
states. The point is that certain associations are not expected. 
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relation-definition routine, and the relation interpreter . 

In order to provide a general capability for dealing with 
inter -at tribute dependencies, the diagnostic program must be able to 
cope with a variety of relationships among attributes. The import- 
ant relationships most likely vary from one diagnostic problem 
area to another. It does not seem advisable to attempt to catalog 
these relationships within the program itself, since it is extremely 
difficult to predict just which relationships will be required. Also, 
if the relationships are incorporated within the program itself, it 
is difficult to introduce now ones as they become of interest in a 
particular problem area. 

What is required then is a flexible facility for the program 
to accept new relationships and having so accepted a relationship, 
to incorporate it correctly in the inference process of diagnosis. 
In an attempt to provide this facility, the diagnostic program pro- 
vides the user with the means to define a variety of relationships 
among attributes. A relationship is defined by specifying as a 
Boolean function the conditions under which the relationship is 
true. This function is employed by the diagnostic program whenever 
it is necessary to determine whether the relationship is satisfied 
for a particular state. 

Consider, for example, the case in which it is necessary to 
account for the time of the appearance of certain attributes of a 
particular disease. Imagine that for the disease in question the 
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attribute "rash" appears two days after the appearance of "fever." 
Let the function BEFORE accept five arguments and be defined as 

BEFORE (A1,A2,A3,A4,A5) - 

(EQ (MINUS (CHAR Al A2) (CHAR A3 A4)) A5) 

Here EQ, MINUS, and CHAR are system primitives (defined by the 
diagnostic program). The function CHAR is used to retrieve charac- 
teristics of attributes. For example, the value of 

(CHAR TIME FEVER) 

is the time at which the attribute fever was observed. 
By specializing the function BEFORE as 

BEFORE (TIME, RASH, TIME, FEVER, 2) 

The relationship for the disease in question can be checked. 

Such relationships are defined by the DEFINE subroutine which 
the user can invoke as required. Relationships can also be built 
into the information structure when it is first established if they 
are known to be necessary. To define a relationship among the at- 
tributes of a particular state, one uses the CLUSTER routine. This 
routine re-organizes the state list for the state involved, producing 
an attribute-cluster . Thus, for the example above, the reorganized 
state list might look as that in Figure 6. As with individual at- 
tributes, a conditional probability given state is associated with 
each attribute cluster. 
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Any number of relationships can be defined for the structure 
provided that they can be expressed in the prescribed manner. Com- 
plex relationships can be specified by using functions of functions. 
Note that attributes remain independent for any state unless a re- 
lationship involving them is defined for that particular state,, 
Thus j in one disease "fever" and "rash" may be related in some way, 
while in another they may be independent. 

The diagnostic program employs an interpreter to determine the 
truth of relationships during diagnosis. The interpreter permits 
the correct incorporation of relationships in the diagnostic infer- 
ence. The manner in which the interpreter is employed will be ex- 
amined in detail later. 

B. THE DIAGNOSTIC PROGRAM 
The diagnostic program and its associated routines are the 
heart of the system. These programs embody the various diagnostic 
strategies employed by the system. When one uses the system in 
the solution of a diagnostic problem, he interacts with the diag- 
nostic program. This program provides the interface between the 
user and the facilities of the system. There are three basic 
functions performed by the diagnostic program. (Although, in fact, 
each of these functions is delegated to a set of subroutines, it 
is convenient to consider them as logical functions of the diagnos- 
tic program.) In brief these three functions are: 
f 
1) The interpretation of the attributes of a particular 
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problem based on the information contained in the in- 
formation structure. This function is called the 
inference function . 

2) The selection of tests for the user to apply to the 
system being diagnosed in order to obtain further 
clues as to the system state. This is the test se- 
lection function . 

3) The analysis of the attributes of a problem to de- 
termine whether there are irrelevant attributes 
present or to detect attribute patterns from more 
than one system state occurring simultaneously. 
This is the pattern-sorting function . 

The design of the diagnostic program permits the alteration or 
replacement of any of these three functions independently of any of 
the others. This flexibility is important } because these functions 
are fundamental to this scheme for diagnosis, and it is necessary 
to study different versions of the functions. The possibility of 
changing individual functions without changing the remainder of the 
program greatly facilitates this study. 

Before the diagnostic program can be used in a particular 
problem area, an information structure for that area must be es- 
tablished. This requires that a disk file containing all the rele- 
vant information be created. The disk file can be created using 
the standard input and editing facilities of the time-sharing. The 
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formatting of the file, although specified, is quite simple, and if 
the necessary information is available, the only difficulty in 
creating the file is dealing with the large amount of information 
which may be required. The information in the file consists of state- 
attribute relationships and test cost data. An example of a portion 
of such an input file is shown in Appendix 1. A system program 
processes the input file and from it constructs the information 
structure for the problem area. 

A second file containing the loss structure for the problem 
area is required by the diagnostic program. At present this loss 
structure is always a matrix. Any element of this matrix, 1^ ., is 
the estimate of the loss for diagnosing state j as state i. The 
exact manner in which this information is employed will be made 
clear below. 

As a preface to the discussion of the logical functions of 
the diagnostic program, consider this example of a particular 
application of the program. Suppose the program currently is set 
up to diagnose a certain group of diseases. This means that the 
appropriate information structure and loss structure have been es- 
tablished. A user wishing to invoke the assistance of the program 
does so by providing an initial problem statement. This statement 
is essentially a list of the attributes which have been observed. 

Assume for the example that this list is 

c 

* temperature of 102 

* severe coughing 

* sore right ankle 
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As indicated in Figure 11, the program first invokes the pattern 
sorting function for the current attributes. In this case, the 
pattern sorting function hypothesizes that the attribute "sore right 
ankle" is not relevant to the principal medical problem of the patient, 
and so removes it from the list of attributes for later investigation. 
After the attributes have been processed by the pattern sorting func- 
tion, the set of all diseases which exhibit the relevant attributes 
is obtained and a probability distribution for diseases given these 
attributes and the "experience" in the information structure is 
created. The creation of this probability distribution is the task 
of the inference function. This distribution results from a considera- 
tion of both the current attributes and the knowledge of the various 
diseases. It is the current view of the diagnostic problem assumed 
by the program. 

Now the program invokes the test selection function. The 
object of this function is to select a good test for the user to 
apply to the patient in order to gain more information. In selecting 
this test, the test selection function considers the current proba- 
bilities of the various diseases, the cost of each test, and the 
usefulness of the results expected from the test. The user is in- 
formed of the test which has been selected. The test may be as 
simple as asking the patient questions about his recent exposure to 
other sick persons, or it may be more involved, for example, a chest 
X-ray. In any event, when the user has obtained the results of the 
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test, he reports them to the program. These test results are new 
attributes and the program again enters the loop shown in Figure 11. 
This dialogue with the user continues until a diagnosis has been ob- 
tained. A more detailed trace of a session with the diagnostic 
program is presented in Appendix 2. This brief example provides an 
overview of the operation of the diagnostic program. In what follows, 
each of the primary functions of the program will be discussed in 
detail. 

1. THE PATTERN-SORTING FUNCTION 
As explained in an earlier section, only those attributes sig- 
nificant to the diagnosis of a particular state are associated with 
that state in the information structure. Thus the attribute "sore 
ankle" would not be associated with the disease tuberculosis in the 
information structure; this means that the name of the attribute list 
for the attribute "sore ankle" does not appear on the state list for 
the disease "tuberculosis". Similarly the member list of the attrib- 
ute list for "sore ankle" contains no entry for the state list of 
tuberculosis. If the name of a state list does not appear on the 
member list of a given attribute list, then the conditional probability 
of the attribute given the state is taken to be zero by the program. 
As will be discussed in the following section, the particular method 
of deduction employed by the program (Bayes* rule) results in a zero 
posterior probability for the state given the attribute. For instance, 
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if in the course of a diagnosis in which tuberculosis was considered 
a possible cause of the attributes the attribute "sore ankle" were 
observed, the updated probability of tuberculosis would be zero. 
Since the program removes from current consideration any state with 
zero probability, this approach makes maximum use of each attribute 
to reduce the set of possible diagnoses. 

The problem encountered here is that while "sore ankle" is 
not an attribute of tuberculosis, one certainly can have tuberculosis 
and a sore ankle. This is but one example of the more general prob- 
lem of irrelevant or noise attributes . Unless special precautions 
are taken, such attributes can eliminate the actual state from con- 
sideration when processed by the inference function. A number of 
solutions to this problem are possible. 

One approach is to associate every attribute with every state, 
employing £ probabilities whenever an attribute is not considered 
relevant to the diagnosis of a particular state. As long as £. is 
greater than zero, no state will be eliminated from consideration in 
the manner described above. The difficulty is that this method pre- 
vents the drastic reduction in the set of possible diagnoses which 
is necessary for efficient operation of the program. A second ap- 
proach is to employ the £ probabilities as above, but to eliminate 



This probability might be taken to be the unconditional proba- 
bility of the attribute. Since this probability may be quite small, 
the problem discussed here could still be encountered. 
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from further consideration those states whose posterior probability 
falls below a fixed threshold. This method is unsatisfactory be- 
cause the posterior probabilities for the various states can undergo 
radical change as additional attributes are observed and employed 
by the inference function. Thus, there is no guarantee that a state 
with a very low probability in the early stages of the diagnosis will 
remain improbable with the observation of new attributes. This 
problem can be even more severe if the noise attributes are the first 
observed. In either event, the actual state may be removed from fur- 
ther consideration by this method. Another approach is to decide 
whether an attribute is relevant to the diagnosis or merely noise 
before it is processed by the inference routines. This is a very 
difficult task to accomplish given the particular model employed in 
diagnosis by the program. The model of the system being diagnosed 
consists principally of state-attribute relationships without any 
information about causal connections. Thus, the only way to evalu- 
ate the relevance of an attribute to the diagnosis is to consider 
some measure of its probability given the diagnosis to date. Since 
almost every measure of this kind depends on the current prior dis- 
tribution, which, in turn, depends on the observed attributes assumed 
to be relevant, a cyclical argument results. 

A second problem arises when attributes characteristic of two or 
more distinct states are observed, as in the case of an individual 
with more than one disease. This is more than a problem of simple 
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noise since the program must detect two or more patterns . Again 

the methods mentioned above are inadequate to cope with this problem. 

The solution to this problem which has been incorporated in 
the diagnostic program involves processing a number of attribute 
patterns in parallel during a diagnosis. A pattern is a subset of the 
set of observed attributes which has the following two properties: 
1) At least one state in the information structure exhibits all the 
attributes in the pattern with a non-zero probability and 2) The 
pattern is not a subset of any other pattern. If the set of observed 
attributes contained a number of the attributes of tuberculosis and 
the attribute sore ankle, one pattern would be the set of tuberculosis 
attributes. A second pattern would be obtained by choosing a disease 
for which sore ankle is an attribute and taking the intersection of 
the set of attributes for that disease and the set of observed at- 
tributes. Perhaps the set of attributes obtained in this way, using 
a second disease on the member list of "sore ankle," might be dif- 
ferent from both those previously obtained. If so, this set is still 
another pattern. 

Throughout the course of a diagnosis, a pattern stack is main- 
tained by the pattern-sorting function. A schematic of the pattern 
stack is presented in Figure 12. Each pattern is represented by a 
sublist of the pattern stack, and associated with each pattern is 
the probability distribution for the states of the system given the 
attributes of the pattern. 
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PATTERN STACK 




FEVER 



COUGHING 



Figure 12 
Pattern Stack 
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Whenever a new attribute is obtained in a diagnosis, it is 
processed against every pattern in the pattern stack. The new at- 
tribute is used to update a pattern if it is relevant to at least 
one state in the probability distribution for the pattern. After 
this updating, the attribute is added to the pattern. If no state 
in the probability distribution of a pattern is known to exhibit the 
new attribute, no changes are made to either the pattern or the dis- 
tribution. The actual manner in which distributions are updated to 
account for new attributes is discussed in detail in the next sec- 
tion on the inference function. 

When the new attribute has been processed against all patterns, 
a routine called PATFEM is invoked to form new patterns if possible. 
PATFRM retrieves the member list of the attribute list corresponding 
to the new attribute. For each state on the member list, the set of 
probability distributions in the pattern stack is searched. If the 
state is found in this set, the pattern for the state is already in 
the pattern stack. If the state is not found, the intersection of 
the set of attributes denoted by the appropriate state list and the 
set of observed attributes is a new pattern. This pattern and the 
corresponding distribution for the states is added to the pattern 
stack. While it is conceivable that this procedure could generate 
many patterns for a given information structure and attribute se- 
quence, this is not a serious problem. First in most areas the num- 
ber of distinct patterns which can be formed by this procedure for a 
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given attribute set is quite limited, because states exist in 
groups which have overlapping attribute patterns. Secondly, the 
number of patterns considered can be limited by considering only 
those patterns with a probability greater than some threshold. 

This procedure also includes a provision for removing patterns 
from the stack. If the inference function determines that the 
probability of a particular pattern is zero, the pattern and its 
associated distribution is eliminated from the pattern stack. The 
contents of the pattern stack, then, can be quite dynamic during 
the course of a diagnosis as new attributes trigger the addition 
and deletion of patterns. 

As an illustration of this aspect of the pattern sorting 
function, consider the following example. At a given stage in a 
diagnosis of a medical problem, three attributes have been observed. 
These attributes are A, B and C. Also assume that of the diseases 
represented in the information structure, none exhibits all three 
of these attributes. A number of diseases exhibit A and B as at- 
tributes, however, and so this is a pattern. The point here is that 
while a disease which exhibits A and B can occur with C also present, 
C is not considered relevant to the diagnosis of any of these 
diseases. For the diseases for which C is a relevant attribute 
A is also relevant. For this situation the pattern stack can be 
represented as in Figure 13A. Here the symbol "J{* denotes the dis- 
tribution list for a pattern. 
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Now when the new attribute D is observed, it is processed 
through the pattern stack. Assuming that the new attribute is rele- 
vant to some of the states in distribution 1( i, this distribution is 
updated by the inference function to produce 7f i and the attribute D 
is added to the pattern. Attribute D is not relevant to the second 
pattern in the stack, and so this pattern and its associated dis- 
tribution remain unchanged. Finally, the routine PATFHM is invoked 
to search for new patterns. Assume that no new patterns are formed. 
Thus, at the end of this phase in the processing of the new attrib- 
ute, the pattern stack appears as in Figure 13B. 

Now in the event that there is more than one pattern in the 
stack, the diagnostic program must make a decision as to which 
pattern to diagnose. Thus, the program must generate a hypothesis 
about the significance of the various patterns in the stack. For 
example, if one pattern corresponds to a majority of the attributes 
of tuberculosis, and the other to a single attribute "sore ankle," 
it is extremely important for the program to give priority to the 
former pattern. The problem is to establish pattern selection rules 
which will make the "correct" decision in such a case. 

One consideration which is relevant to the selection of a pat- 
tern is the seriousness of the states suggested by the pattern. For 
this reason, an attribute quite specific to a very serious disease 
will strongly influence the course of a medical diagnosis. 

In order to account for the relative seriousness of different 
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and If . - a priori probability of state j. 

Values of 6 decrease with increasing seriousness of states. This 
can be seen in the following simple example. 



LOSS 



1 



1. Benign tumor 0.7 1,000,000 300,000 

2. Malignant tumor 0.3 100 70 



While other more sophisticated measures of seriousness can be de- 
veloped, this simple one was deemed suitable for the purposes of 
this research. 

Once the seriousness of the various states has been estab- 
lished, the problem of pattern selection can be solved in a quite 
reasonable way through the use of the Bayesian model. For each 
pattern, a conditional distribution on states can be obtained by 
the inference function. For each pattern, the distribution is con- 
ditioned on the attributes of that pattern alone — all other patterns 
are ignored. Thus for the k£2l pattern 
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Where "JT ■ is tne conditional probability of the j — 

state (Mj) given the pattern f-Sfk . . . S mk ) • 

The seriousness measure for the k — pattern is given by 
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and the pattern selected is the one with minimum J . 

This measure has several desirable properties. Consider the 
case of an attribute which is very specific to a serious disease. 
If that attribute is observed, the conditional probability for the 
serious disease given the pattern containing the attribute will be 
close to one. Since the corresponding value of 9 is small, the 
value of IT for the pattern will be small. Hence this pattern will 
quite likely be selected. On the other hand, if the attribute is 
not specific to the serious disease, the conditional probability 
for the disease given the pattern will be less; and the resulting 
value of Y\ greater. 

The measure also favors a pattern which contains many .attributes 
provided that the pattern strongly indicated one or more serious 
states. The posterior distribution does not have to be spiked, how- 
ever, for a pattern to be chosen. For example a pattern which re- 
sults in equal probabilities for six states may also be chosen if 
the seriousness of the individual states so warrants. This measure 
accounts for both the specificity of a pattern and the seriousness 
of states associated with the pattern. In this respect, it seems to 
be a good way to select patterns for investigation. 

A routine called SELECT chooses the current pattern for the 
diagnostic program, and this pattern may change from time to time 
as additional information is gathered by the program. The current 
pattern is the one employed by the test selection function for 
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evaluating tests. Before each use of the test selection function, 
SELECT chooses the current pattern based on all information cur- 
rently available. 

A number of other processing routines affect the pattern stack 
during the course of a diagnosis. Recall that whenever the pattern 
sorting function produces more than one pattern in the stack, the 
selection of a pattern for further diagnosis constitutes a hypothe- 
sis about the significance of a group of attributes. If a consist- 
ent diagnosis for the current pattern is obtained, then the hypothe- 
sis is tentatively confirmed. If there are no other attributes to 
account for then a consistent diagnosis for all attributes has been 
obtained. Otherwise the remaining patterns must be considered. It 
is possible that a second pattern is being diagnosed, new attributes 
may prove the hypothesis about the first pattern to be incorrect. 
In this case, the attributes in this pattern can no longer be con- 
sidered accounted for. These possibilities are dealt with in the 
following way by the pattern sorting function. The program maintains 
a list called the "unaccounted-for" list, and on it are all those 
attributes which have yet to be attributed to a particular system 
state. When the current pattern is "diagnosed" or assigned to one 
state, the attibutes in the pattern are removed from the unaccounted- 
for list, and the pattern itself is marked. A marked pattern is ig- 
nored in test evaluation, although it is updated with new attributes 
whenever appropriate. When the current pattern has been marked, all 
unmarked patterns are deleted from the stack. Then PATFRM is called 
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for each attribute in the unaccounted-for list. Patterns are formed 
using the unaccounted-for list as the total attribute set. If the 
unaccounted-for list is empty, a consistent diagnosis for all attrib- 
utes has been obtained. Otherwise, the diagnosis continues on the new 
patterns. 

This means that attributes which are included in marked patterns 
are not utilized in the formation of new patterns at this time. If, 
for example, the total attribute set were (A, B, C, D) and (A, B, D) 
had been tentatively diagnosed, the only unmarked pattern would be 
(C) . This is true even though there may be states which exhibit both 
C and A. If, however, the test selection function chooses a test 
which can detect A, A will be added to the unmarked pattern. This is 
because the program always consults the history of the diagnosis be - 
fore requesting the user to run a test. If on the other hand, the 
program would normally account for C without employing knowledge of 
A, it will do so. 

If a new attribute causes the probability of a marked pattern to 
become zero, a special recovery procedure is invoked. First, each 
attribute of the marked pattern is transferred to the unaccounted-for 
list. If one of these attributes is added to the list, it is also 
processed against all the other patterns in the stack. When the stack 
has been updated with such an attribute, PATFRM is invoked to check 
for new patterns based on this attribute. Finally, the marked pattern 
is deleted from the pattern stack, and diagnosis continued. 
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Thus, the contents of the pattern stack may be quite volatile 
during a diagnosis, although cases of extreme volatility are not ex- 
pected to occur very often. In any event, the use of the pattern 
stack permits the program to deal with noise and multiple patterns 
in a reasonably efficient manner. By allowing the user to inter- 
act with the program during diagnosis, it is possible to employ his 
judgment with regard to the merits of pursuing particular patterns. 

2. THE INFERENCE FUNCTION 

In general, the observation of a new attribute provides the 
diagnostic program with additional information about the current 
state of the system being diagnosed. Based on this observation, the 
program may significantly alter its estimate of the likelihoods of 
the various states. This section discusses in detail the manner in 
which the program incorporates observations of attributes into its 
current view of the diagnostic problem. The routines which process 
new attributes for their effect on the current view of the problem 
collectively are called the inference function. 

The basic analysis of attributes and inference done by the diag- 
nostic program is based on Bayes rule. Bayes rule can be stated as 
follows 

PCMJs t ,£) , PQi 1 /QP(s t /M) 

where P(M, /£) is the probability that the current state is M. 
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conditional on the total experience to date. 
P(S t /Mj,£) is the probability that the system will exhibit 
attribute S t given that it is in state M* and the diagnos- 
tic experience £ . 

P(S /£) is the probability of the system exhibiting S. t 
unconditional on state. 

P(Mj/S tJ £) is the conditional probability that the state of 
the system is ML given £ and the newly observed attribute S t . 
The quantity P(Mj/£) is called the prior probability and P(Mj/S tJ £ ) 
is called the posterior probability of the state M^. The observa- 
tion of the attribute S t increases the experience or information 
available on which to make a decision about the unknown state. The 
posterior probability is an adjustment of the prior probability to 
account for the new information. After this adjustment has been made^ 
the posterior probability is the new prior probability when further 
attributes are observed. Consider the following example of this 
basic inferential process; 

Suppose there are only two states relevant to the current diag- 
nostic problem, Mi and M2, and three attributes S^, S2 and S3. The 
a priori probabilities for the two states as well as the conditional 
probabilities for the attributes given the states are presented in 
Table 2. 
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TABLE 2 
EXAMPLE FOR BAYESIAN ANALYSIS 



Conditional 
Probability of Attribute/ State 





A priori 
probability 


S l 


s 2 


s 3 


M l 


0.8 


.1 


.4 


.1 


Mo 


0.2 


.7 


.6 


.9 



The initial experience of the program, before any attributes have been 
observed, is embodied in the a priori probabilities. Thus, the cur- 
rent distribution on states is (0.8, 0.2). Now assume that tests 
employed in the diagnosis reveal the presence of attribute S.^. Ac- 
cording to Bayes rule, the posterior distribution is (.82, .18). 
That is 

*<*J*1>* > - (0.8)(0?80^(6 8 2)(0.7) " °-« 2 

P(M 2 /Si,f ) = (0,2) (0,7) = 0.18 

*< (0.8)(0.8)+(0.2)(0.7) 

Thus, the new attribute has little effect on the view of the problem 
taken by the program. If two nore tests yield the attribute S^ and 
then the attribute S3, the corresponding distributions are: 

P(M 1 /S 1 ,S 2 ,^) = 0.75 P(M 2 /S 1 ,S 2 ,^ ) = 0.25 
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tern and its distribution list are removed from the stack. While 
Bayes rule is easily applied in principle , the inference function 
must include special routines to insure that inter-attribute rela- 
tionships and the "history" of the diagnosis are correctly accounted 
for in the probabilistic analysis. 

The routine UPD which performs the updating of the pattern stack 
based on the observation of a new attribute is to a large extent a 
simple encoding of Bayes rule. The routine, however, does not ob- 
tain the requisite conditional probabilities directly. Instead, it 
calls PLJ to obtain the conditional probability of attribute "j" 
given state "i" and the history of the diagnosis to date. The reason 
for this indirection in the accessing of probabilities is really a 
pragmatic one. The insulation UPD from the probability-retrieving 
process allows changes in this process to be made without affecting 
the basic inference process. 

As noted, the function of PIJ is to retrieve conditional proba- 
bilities from the information structure. In the simplest case, this 
involves retrieving a number directly from the information structure. 
When the attribute of interest is involved in an attribute cluster for 
the given state, the process of determining the conditional probability 
is more involved. 

The general form of an attribute cluster is either 
a. (Q x R x ) 



or b. (9. R, ♦ e. IU • . . . • 9 R ) 
11 2 z n n' 
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where R. is an inter-attribute relationship; 

0. is the conditional probability of R. given the state 
9 is either "exclusive or" or "or." 
Here R^ can be any inter-attribute relationships (including functions 
of functions, etc.) as long as it does not include 9-. The reason for 
this restriction is to eliminate ambiguity from the probability 
assignments. In fact, the restriction does not limit the class of 
logical relationships which can be defined, only the form which in- 
dividual members may assume. Thus, for example, Rj might be the 
cluster for the relationship 

"Either k-^ precedes A 2 in time or A-j^ does not appear at all," 

In order to evaluate the conditional probability of an attribute 
involved in an attribute cluster, PIJ must be able to evaluate the 
truth of the relationships R j . It does this by calling the routine 
INTERP to determine the true value of each R j . INTERP is an inter- 
preter, which retrieves the definitions of any functions involved in 
R- and applies these definitions to the appropriate arguments from 
the attribute cluster. The interpreter employs a push-down stack and 
recursive calls in the evaluation. All functions are reduced in this 
way to their component primitive functions. Routines to evaluate the 
primitive functions are built into the system. 

The operation of the interpreter differs in certain aspects 
from that of a normal interpreter of Boolean functions, because this 
interpreter must deal with variables whose current value is unknown. 
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For example^ suppose the relationship under consideration for a par- 
ticular state M- is "A^ precedes A2 in time" with probability 0.5. 
Assume A-^ has just been observed and the conditional probability of 
A-^ given the state is desired. If A2 has not yet been observed., 
the relationship is incomplete (or from a logical standpoint,, unde- 
fined). From a Bayesian point of view,, however, the conditional 
probability is well-defined; it can be obtained by assuming that A2 
will in fact follow Ai in time. This assumption results in a value 
of 0.5 for the conditional probability of A-^ given M- . If A2 is ob- 
served later, then its conditional probability can be obtained in a 
similar manner, but the prior observation of A must be taken into 
account. This means that the desired probability of A ? is conditional 
on the state M. and the previously observed A Hence the proper con- 
ditional probability is 1.0. 

In general terms, the interpreter assumes the truth of any 
relationship which is incomplete unless that relationship is demon- 
strably false given the current information of the diagnosis. The 
interpreter must also indicate whether any attributes involved in a 
cluster have actually been observed. Given these modifications of the 
interpreter function, the routine PIJ can deduce the proper conditional 
probability for the given attribute-state pair. PIJ embodies a number 
of logical tests on the truth of the R. and the number of observed 
attributes involved in each. For the types of relationships allowed 
in the information structure, these quantities are sufficient to deter- 
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the relevant attributes are first presented by the user. Through 
the use of the interpreter, the diagnostic program is able to deal 
with variety of relationships within a particular problem area. 

3. THE TEST SELECTION FUNCTION 

The value of heuristics for test selection in diagnostic prob- 
lems has been underscored in previous sections. In this section, a 
particular test selection program is discussed. This program (which 
is , in fact, a number of subroutines) is the one employed in the diag- 
nostic program. The nature of the program strategy and organization 
is explained and some of its limitations are noted. 

From the model of a diagnostic problem discussed in Chapter 3, 
it will be recalled that one of the major tasks in diagnosis is the 
selection of a good set of tests to apply to the system. The de- 
termination of such a testing strategy involves a consideration of 
both the costs of tests and the information which they are expected to 
yield. Thus, any heuristic for the test selection process should re- 
flect these considerations. Another consideration involves the amount 
of computation involved in applying the heuristic in a particular 
diagnosis. In order to facilitate the study of a class of such test 
selection heuristics, the test selection function was designed to be 
in large part independent of the particular heuristics employed. While 
the class of heuristics permitted is not particularly large, it does 
include heuristics which lead to markedly different test selection 
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searched by the test selection function during a particular stage in 
a diagnosis are searched to the same depth. The limitations arising 
from this inflexibility will be discussed later. 

The breadth of the search is controlled indirectly by the user 
through the use of a threshold probability. At a given decision node, 
only those tests which are relevant to a state with a probability 
greater than the threshold are considered by the test selection func- 
tion. For example, if the probability distribution at a given decision 
node is (0.2, 0.3, 0.5) for states M p M 2 , M 3 and the threshold is 0.25, 
only those tests relevant to states M 2 and to M^ will be considered. 
Those tests which are relevant to M^ alone will be ignored. A test 
is considered relevant to a particular state only if an attribute 
which is associated with the appropriate state list in the information 
structure is a possible result of the test given the probability dis- 
tribution for the current decision node. Since the control of the 
breadth of search is indirect, in general, the user cannot easily 
predict the extent of the pruning of the decision tree which will 
result. Some feeling for reduction in the search space can be gained 
from experience with the program in a particular problem area. Note 
that in the above example, if all the tests which are relevant to 



An exception occurs when a particular node corresponds to a 
certain diagnosis. The search of the branch containing this node 
will terminate there. 
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state M-l are also relevant to either M2 or M^ then the threshold 
probability will not result in any pruning of the decision tree. 
The maximum search breadth is obtained with a threshold of zero. 

Like the search depth parameter, the threshold parameter can 
be set prior to each stage in the diagnosis. Also these two para- 
meters can be varies independently of one another (subject only to 
a practical constraint of available storage). This flexibility per- 
mits the overall selection strategy to change during the course of 
the diagnosis. 

There are four routines in the test selection package, each 
performing a distinct function in the tree search. The principal 
routine is SEQDEC which serves as the main control for the process 
of test selection. The diagnostic program communicates with the 
test selection package through SEQDEC. It provides this routine 
the name of the node in the decision tree which corresponds to the 
current state of the diagnosis. SEQDEC then analyzes the tree to 
the appropriate depth and breadth to obtain the testing decision. 

Because the decision tree can require considerable storage 
even for limited search depth and breadth, the tree is developed dy- 
namically. That is , new levels are added only as they are needed, 
and levels are erased when they have been analyzed. SEQDEC is called 
with the name of a decision node as an argument. This decision node 
is represented by an empty SLIP list which has on its DLIST a list 
containing a probability distribution over system states. This dis- 
tribution incorporates all the attributes which were observed on the 
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path from the beginning of the tree to the current node. 

SEQDEC first determines the expected loss for an optimal decision 
at this node. The manner in which this value is determined will be 
explained below. If the level of the current node equals the re- 
quired depth of search this expected loss is returned as the expected 
loss for the node. If not, the current loss for this node is assigned 
this value and if the level of the node is the topmost level of the 
analysis, the terminal decision and its value are stored in a special 
list. In any event an additional level must be "grown" on the tree. 
First the routine RELTST is called by SEQDEC. RELTST determines the 
set of tests which are relevant to the states whose probability at 
the current node exceeds the threshold. Excluded from this set are 
all those tests which have been actually run. These latter tests are 
known to RELTST because whenever a test is selected by the diagnostic 
program and run by the user, its name is placed on a list called 
TSTRUN in common storage. RELTST stores the names of the relevant 
tests on the current decision node list. 

After RELTST has collected the set of relevant tests, SEQDEC 
processes each of these tests in turn. SEQDEC begins reading the 
list of tests. For each test, a routine called GR0W1 is invoked. 
This routine determines all possible results of the given test and 
their respective probabilities. For each result, the routine con- 
structs a new decision node. First the current test is placed on the 
top of TSTRUN to simulate the running of the test and then for each 
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of the possible results of the test, SEQDEC calls itself recursively 
to obtain the expected loss of the resulting decision node. When 
this value has been obtained, it is weighted by the probability of 
the given result and the product accumulated. The sum of the expected 
loss for each result is combined with the cost of the test. The cur- 
rent test is removed from TSTRUN and the portion of the decision tree 
which has just been analyzed is erased. If the analysis is at the 
topmost level the value of the test is saved. This means that the 
expected losses for all alternatives at the current level are available. 
In the event that the best alternative cannot be employed (e.g. a test 
cannot be run for some reason), the next best alternative can be 
chosen. In any case, the expected loss for this test is compared with 
that of the best decision to date for the node. If it is less, the 
current test becomes the best decision. The analysis then proceeds 
to the next test alternative. When all alternatives have been evalu- 
ated for the current decision node, SEQDEC returns the expected loss 
of the best decision as determined by the analysis. 

The determination of the optimal terminal decision as accomplished 
by a routine called DLOSS. This routine employs the probability dis- 
tribution, the decision node and the loss function to determine the 
value of the minimum expected loss terminal decision for the node. 
If n i is the probability of the state M^ in the current distribu- 
tion and 1.. is a typical element from the loss function matrix, DLOSS 
selects state M^ where 
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E = 



j=l L j=l 



and E is the expected loss of the optimal terminal decision for the 
node. The state selected by DLOSS and the value E are returned to 
SEQDEC. 

By controlling the breadth and the depth of the search employed 
by the test selection function,, the user can generate a number of 
different test selection heuristics. For example, he might use a 
threshold close to zero and a depth of one early in a diagnosis 
when many states are still possible. Because the probability dis- 
tribution based on only a few attributes may be quite diffuse., a 
low threshold is needed to insure that significant tests are not 
overlooked. On the other hand^ the potentially large number of de- 
cision nodes requires a limited depth of search. As the diagnosis 
progresses and a few states become relatively probable^ the thres- 
hold can be raised with less danger of missing significant tests. 
With the higher threshold it may be possible to improve the evaluation 
of tests by increasing the depth of the search. 

The selection scheme above can be supplemented by the use of 
two additional features of the program. Firsts the user can re- 
strict the set of relevant tests to those associated with the best 
terminal decision at a given node. In the case when the loss function 
is a constant for all ordered pairs of states, this corresponds to 
considering the tests which are relevant to the most probable state. 
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Since the routine DLOSS can determine the best terminal decision at 
a given decision, the appropriate state can b e made available to RELTST. 
By considering only the tests relevant to this state, the user in 
a sense in limiting the search to those tests which will tend to 
prove or disprove the hypothesis that the given state is indeed the 
best decision. In practice, the user obtains this option by setting 
the threshold probability to a number greater than one. 

In order to permit the user an even greater facility to test 
hypotheses, the program permits him to request a search for tests 
to prove or disprove the hypothesis that "the state of the system 
is M k ." If the user chooses to test such a hypothesis, the test se- 
lection function will alter its method of evaluating decision nodes. 
All decision losses (l^j) are set temporarily to a certain very 
high value. The routine DLOSS then considers only two states in its 
evaluation of the loss for a given node. One state is M k and the 
other is "not M k ." With these adjustments, the test selection func- 
tion will rank tests according to their expected value in proving or 
disproving the presence of state M^. 

A comparison of a number of particular selection heuristics 
employed in this research will be presented later in the thesis. 

C. THE GENERATOR PROGRAM 
The diagnostic program discussed in the previous sections is a 
major tool in this research. By exploiting the interactive capabili- 
ties of the program, the user can employ it directly in the solution 
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of actual diagnostic problems. Of equal importance., however., is the 
availability of the program as a test vehicle for a variety of over- 
all diagnostic strategies. By specifying the heuristics to be em- 
ployed in the pattern sorting and test selection functions,, one is 
defining a diagnostic strategy . Since diagnostic problems tend to 
be difficult and the program operation is quite complicated., it is 
not an easy task to make generalizations about a given diagnostic 
strategy. There are many important questions which can be asked about 
a diagnostic strategy such as 

* How is the performance of the program affected by noise signs? 

* What is the effect of uncertainty in the probabilities on 
the performance of the program? 

* How do various changes in the relevant probability distribu- 
tions affect program performance? 

Questions such as these are difficult to answer based on experience with 

only a few problem areas. If one is constrained to work with descrip- 
tions of actual systems, it may be very difficult to establish the 
conditions required for the test of a particular aspect of the pro- 
gram. If j on the other hand^ one can employ a wide variety of system 
descriptions,, the program can be exercised more thoroughly,, One ap- 
proach is to create an information structure with the desired proper- 
ties and to test the diagnostic program with simulated problems from 
this artificial problem area. Information gained from such studies of 
diagnosis "in the abstract" may suggest improvements in the program. 
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It may also provide a deeper insight into the problems involved in 
solving real diagnostic problems. If such a simulation facility 
were available, simulated cases generated from the structure for an 
actual problem area could be utilized to conveniently investigate 
aspects of diagnosis in that area. 

The diagnostic system includes such a simulation facility in 
the form of the generator program . This program is the third major 
part of the diagnostic system. Like the diagnostic program, the 
generator makes extensive use of the information structure. The 
system for which problems are to be simulated is described in the 
standard manner by the user. This description is converted to an 
information structure which is available to both the diagnostic pro- 
gram and the generator. The basic operation of the generator is as 
follows. First j a state is chosen at random from the set of possible 
states for the system in accordance with the a priori probability 
distribution. Then a certain number of initial attributes (the 
number being controlled by the user) are generated at random given 
the description of the state in the information structure. The set 
of initial attributes constitutes the problem presented to the diag- 
nostic program. The latter is called to process these attributes. It 
selects a test in the usual manner. Given the state and the test, 
the generator selects a test result and conveys this response to 
the diagnostic program. This interaction between the generator and 
the diagnostic program continues until the latter arrives at a diagno- 
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sis. This diagnosis then can be compared with the "known" state 
used by the generator. 

As an example of the operation of the generator, consider its 
use in the following simplified problem. The generator is used 
to simulate disease case histories for the disease-attribute proba- 
bility matrix presented in Table III. The relevant tests are listed 
to the right of the matrix. Assume that cases are to be drawn at 
random from the structure and that one initial attribute is to be 
presented to the diagnostic program. 

The generator first selects the disease. It does this by creat- 
ing a list of all possible diseases and cummulative probabilities. 
For this example, the list would be 

(Dl 0.3 D2 1.0) 

Each cummulative probability is the sum of the a priori probabili- 
ties of the diseases preceding it In the list. Then a random number 
between zero and one is generated. The list of diseases and cummula- 
tive probabilities, called the generation list, is searched for a 
disease with the property that the probability preceding it is less 
than and the probability following is greater than the given random 
number. This disease satisfying this condition is chosen for this 
case. Thus, if the random number generated in the example were 0.41, 
the disease selected would be D2. Assuming the disease D2 has been 
chosen, the generator now selects the initial attributes which define 
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TABLE 3 



Disease Description for Generator Example 



a priori P (At tribute /Disease) 

Disease Probability Al A2 A3 M A5 A6 

Dl 0.3 0.3 0.7 0.5 1.0 0.5 0.5 

D2 0.7 0.8 0.2 0.3 0.2 0.6 0.4 



Test 
Tl 
T2 
T3 
T4 



Attributes 


Al, A2 


A3 


A4 


A5, A6 
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with the appropriate probabilities and returns it to the diagnostic 
program. This iterative process continues until the diagnostic pro- 
gram has completed the diagnosis. 

In this example, only one attribute was generated for each test. 
There are tests, however,, from which several attributes can be ob- 
tained. Such tests are marked in the information structure, and the 
generator will generate a set of test results for these tests. 

The diagnostic system will record an extensive history of each 
diagnosis or selected aspects of that history on a history file if 
requested to do so by the user. A schematic of the relationships 
among the three major parts of the diagnostic system is presented 
in Figure 14. In the remainder of this section, certain features of 
the generator-diagnostic program interaction will be discussed in 
detail . 

The subroutine GETSYM is the principal link between the genera- 
tor and the diagnostic program. It is this routine which is called 
by the diagnostic program whenever the latter requires a test to be 
run. If the diagnostic program is being controlled by the user from 
the console^ then GETSYM retrieves the test results from him. If 
the generator is in control, a routine called GENSYM is invoked to 
generate an appropriate response to the chosen test. The diagnostic 
program itself is independent of the source of responses to tests. 
GENSYM is also used by the generator to select the initial attributes 
of a problem. All system output (such as requests for test results,, 
distributions^ etc.) is processed by a special output package. This 
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Schematic of Diagnostic 
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another advantage is that many cases can be simulated in a rea- 
sonable amount of time. 
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The second problem encountered in the diagnosis of bone 
tumors is the large number of potentially useful attributes which 
can be extracted from a radiograph. Generally speaking., there are 
four direct kinds of information which are obtained from a radio- 
graph of a bone tumor (R-20) 

1) Destruction of bone 

2) Proliferation of bone 

3) Mineralization of tumor matrix 

4) Location,, size^ and shape of tumor. 

Each of these general classes of information is broken down into 
a number of more specific attributes. The result is the large num- 
ber of attributes mentioned above. Hence^ the diagnostician is con- 
fronted with a considerable amount of data which he may employ in 
classifying a particular tumor. 

The particular study discussed here involved the diagnosis 
of actual cases of bone tumors, each of which was classified into 
one of nine histological types. These types are listed in Table 4. 
The evidence employed in the diagnoses consisted of fifty-three 
attributes obtained principally from radiographs. (The age of the 
patient was the only non-radiologic attribute considered.) The 
attributes are listed in Table 5 along with their abbreviations 
used in discussions of particular diagnoses. 

The case histories and the disease-attribute probability matrix 
used in this study were obtained from Dr. G. S. Lodwick of the 
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University of Missouri. Dr. Lodwick and his associates developed 
the matrix as a result of many years experience with cases of bone 
tumors. Thus, the matrix represents the distillation of extensive 
diagnostic experience with the problem. It reflects both the stat- 
istical experience and understanding of the disease processes in- 
volved of the workers who created it. The papers cited above sum- 
marize their work and are recommended to any reader who is interested 
in a more authoritative view of the problem than the competence of 
this author permits him to present. 

B. Experiments in Bone Tumor Diagnosis 

The diagnostic system was used to study various aspects of 
bone tumor diagnoses. The disease -at tribute probability matrix pro- 
vided by Dr. Lodwick was used as the basis for an information struc- 
ture for the system. A state was defined for each of the nine types 
of bone tumor. A set of thirty-two tests were defined. Some of 
these tests such as that of determining the age of the patient can 
result in one of a number of attributes. In the case of the age 
test, the possible attributes are: 1) age to 9 years, 2) age 10 
to 19 years, 3) age 20 to 29 years, 4) age 30 to 39 years, and 
5) age 40 years and over. Other tests are specific for one attrib- 
ute, such as the test of checking for geographic destruction of 
bone. The set of tests and the respective attributes which may re- 
sult is presented in Table 6. Throughout the remainder of this 
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TABLE 4 



HISTOLOGICAL TYPES FOR BONE TUMOR DIAG80SIS 



Relative 



1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 



Type 

ChrondoblAStoma 
Chr ondo sar coma 
Swing's Sarcoma 
Fibrosarcoma 
Giant Cell Tumor 
Osteosarcoma 
Parosteal Sarcoma 
Reticulum Call Sarcoma 
Chrondomyeoid Fibroma 



Abbreviation 


Incidence 


CB 


0.05 


CS 


0.17 


ES 


0.15 


FS 


0.10 


GC 


0.15 


OS 


0.25 


PS 


0.05 


RC 


0.05 


CF 


0.03 



1.00 



Note: This formulation assumes that each patient has one and 
only one of the given diseases. 
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TABLE 5 



ATTRIBUTES FOR BONE TUMOR DIAGNOSIS 



Attribute 



Meaning 



Attribute 



Meaning 



502 Age 00-09 years S34 

503 Age 10-19 years S35 

504 Age 20-29 years S36 

505 Age 30-39 years S37 

506 Age 40 years and over S38 

507 Tumor Size 01-30 Millimeters S39 

508 Tumor Size 31-60 Millimeters S40 

509 Tumor Size 61-90 Millimeters S41 

510 Tumor Size 91 MM and over S42 

511 Shape-Round (L LT 1.5 X W) S43 

512 Shape-Elongated (L GE 1.5 X W) S44 

513 Location-Central S45 

514 Location-Eccentric S46 

515 Location-Cortex/Parosteal S47 

516 Long Bone S48 

517 Flat Bone S49 

518 Small Bone S50 

519 Sacrum and Pelvis S51 

520 Any Bone -Epiphysis S52 

521 Any Bone-Growth Plate S53 

522 Tubular Bone -Articular Cortex S54 

523 Tubular Bone -Me tap hys is S55 

524 Tubular Bone-Shaft S56 

527 Matrix-Radiolucent S57 

528 Matrix-Floccules S58 

529 Matrix-Solid S59 

530 Matrix-Lumpt S60 

531 Matrix-Clouds S61 

532 Destruction-Geographic 

533 Destruction-Motheaten 



Destruction-Permeated 

Margin-Regular 

Margin-Lubulated 

Mar gin -Ragged 

Mar gin- Indistinct 

Transition Sharp or Smudged 

Invaaive Zone 

Special Sign-Fracture 

Special Sign-Displacement 

Proliferation-Sclerotic Rim 

Pro 1 if. -Multiple Small Foci 

Proliferation-Endostosis 

Periosteal -Hyperostosis 

Periosteal -But tress 

Periosteal-Trabeculae (Septae) 

Cortex Expanded 

No Codman's Triangle 

One Codman's Triangle 

Two or More Codman's Triangles 

No periostosis 

Laminated Periostosis 

Amorphous Periostosis 

No Spiculation 

Sunburst Spiculation 

Hair-on-end Spiculation 

Velvet Spiculation 

Periosteal Response-Delicate 

Periosteal Response-Coarse 
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TABLE 6 



TESTS FOR BONE TUMOR DIAGNOSIS 



Test 

1. TEST2 

2 . TEST7 

3. TEST11 

4. TEST13 

5. TEST 16 

6 . TEST20 

7. TEST21 

8. TEST22 

9. TEST23 

10. TEST24 

11. TEST27 

12. TEST28 

13. TEST29 

14. TEST20 

15. TEST31 

16 . TEST32 

17. TEST33 

18. TEST34 

19. TEST35 

20. TEST39 

21. TEST41 

22 . TEST43 

23. TEST44 

24. TEST45 

25. TEST46 

26. TEST47 

27. TEST48 

28. TEST49 

29. TEST50 

30 . TEST53 

31. TEST56 

32. TEST60 



Possible Results 



S02, S03, 


S04, 


S05, S06 


S07, S08, 


S09, 


S10 


Sll, S12 






S13, S14, 


S15 




S16, S17, 


S18, 


S19 


S20, N 






S21, N 






S22, N 






S23, N 






S24, N 






S27, N 






S28, N 






S29, N 






S30, N 






S31, N 






S32, N 






S33, N 






S34, N 






S35, S36, 


S37, 


S38 


S39, S40, 


N 




S41, S42, 


N 




S43, N 






S44, N 






S45, N 






S46, N 






S47, N 






S48, N 






S49, N 






S50, S51, 


S52 




S53, S54, 


S55 




S56, S57, 


S58, 


S59 


S60, S61 







Note : The symbol "N" denotes a "normal" attribute. It means 
that a test may fail to reveal any of the other attrib- 
utes listed. Thus, for TEST41, the possible results are 
S41 or S42 or neither S41 nor S47 (N) . 
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chapter, the abbreviations for diseases and attributes presented 
in Table 5 and Table 6 will be used. In the initial set of ex- 
periments, all tests were assigned unit cost and the cost of all 
misdiagnoses (e.g. deciding the tumor is CS when it is really GC) 
was assumed to be 100,000. This number is quite arbitrary, and is 
used simply to make the decision losses much greater than the test- 
ing losses. 

Experiment 1 . Diagnosis Based on All Attributes 

Each of the twelve case histories was presented to the diag- 
nostic program by inputting all the attributes for the case. The 
diagnostic program processed the attributes through the inference 
function and obtained a posterior distribution for the type of 
tumor. The results of this experiment are presented in Table 7 along 
with the diagnosis of a pathologist provided with each case history. 
The latter is traditionally accepted as the definitive diagnosis 
in cases of this type. 

Experiment 2 . Sequential Diagnoses—Actual Case Histories 

The second experiment exercised the sequential capabilities 
of the diagnostic program. Again, all diseases were taken to be 
equally serious (1. . = 100,000, i + j) and all tests were assigned 
unit cost. The same twelve cases were analyzed by the program. 
For each case, the program was presented with a set of initial at- 
tributes. This set was obtained by collecting the results of the 
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Case 

1 



3 
4 
5 
6 



TABLE 7 

Diagnoses Based on all Available 

Attributes for Actual Bone Tumor Case Histories 



Posterior 


Distribution 


Pathology 


CB 


0.12 


GC 


GC 


0.87 




OS 


0.65 


OS 


CS 


0.35 




CB 


1.00 


CB 


CS 


0.99 


CS 


OS 


1.00 


OS 


ES 


.33 


RC 


RC 


.67 




CS 


0.78 


CS 


FS 


0.22 




ES 


0.04 


ES 


ES 


0.02 




RC 


0.94 




ES 


1.00 


ES 


CS 


1.00 


CB 


GC 


0.65 


GC 


CF 


0.35 




PS 


0.99 


PS 



9 

10 
11 

12 



* Only types with posterior probability greater than or equal to 
0.01 are shown in the tables in this chapter. 
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first ten tests listed in Table 6 from the case histories. Thus each 
diagnostic problem was defined by approximately ten attributes. (In 
certain cases this number was smaller, because some tests are not 
relevant to specific bones.) 

After processing the initial attributes, for the case, the pro- 
gram employed the test selection function to select a test to be run. 
The results of the test selected were determined by consulting the 
given case history. The attribute or attributes resulting from this 
test were given to the program and the inference-test selection cycle 
repeated. Throughout this experiment the test selection function 
searched the decision tree to a depth of one and limited the breadth 
of search to those tests relevant to the most likely disease type. 

For each case, this sequential diagnosis was continued until 
the diagnostic program terminated the process. This termination 
occurred when the program determined the expected reduction in loss 
for the best test at the current decision node was less than the 
cost of the test. 

An example of a. sequential diagnosis is presented in Table 8 and 
the results of the experiment are summarized in Table 9. 

The results of Experiment 2 underscore the potential advantage 
of sequential analysis of attributes in diagnosis. Since all diseases 
were taken to be equally serious for this experiment, the program 
found the best terminal decision to be the most probable disease. Since 
these same conditions held in Experiment 1, it is easy to make compari- 
sons between the results of the two experiments. 



TABLE 8 

Sequential Diagnosis --An Example 

(Actual Case History 12) 
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Test 



1. 



2. TEST29 



3. TEST50 



4. TEST56 



Resulting Attributes 

SOS, S10, S12, S15 
S16, NOT S20, NOT S21 
NOT S22, S23, S24 



S29 



Posterior Distribution 



S50 



S56 



cs 


0.42 


ES 


0.13 


FS 


0.10 


PS 


0.31 


RC 


0.02 


CS 


0.06 


FS 


0.02 


OS 


0.01 


PS 


0.91 


CS 


0.06 


FS 


0.01 


PS 


0.92 


CS 


0.05 


FS 


0.02 


PS 


0.93 



Terminal decision -- PS 
Pathology report -- PS 
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TABLE 9 

Sequential Diagnosis of Bone Tumor Cases 
Summary of Results for Actual Case Histories 





Number of 

Tests Selected 

by Program 

9 


Distribution at 


Disl 


tribution 




Point of Terminal 


When all Attrib- 


Pathology 
1. (GC) 


Decision 


utes < 


Considered 


CB 
GC 


0.21 
0.78 


CB 
GC 


0.12 
0.87 


2. (OS) 


12 


CS 
OS 


0.79 
0.21 


CS 
OS 


0.65 
0.35 


3. (CB) 





CB 


1.00 


CB 


1.00 


4. (CS) 


4 


CS 

ES 
FS 
OS 


0.80 
0.08 
0.08 
0.04 


CS 


0.99 


5. (OS) 


4 


CS 
ES 
OS 
RC 


0.03 
0.02 
0.94 
0.03 


OS 


1.00 


6 . (RC) 


13 


ES 
FS 
RC 


0.30 
0.01 
0.68 


ES 
RC 


0.33 
0.67 


7. (CS) 


4 


CS 
FS 


0.74 
0.26 


CS 

FS 


0.78 
0.22 


8. (ES) 


11 


ES 
FS 
RC 


0.05 
0.07 
0.87 


ES 
FS 
RC 


0.04 
0.02 
0.94 


9. (ES) 


5 


CS 
ES 
OS 
RC 


0.02 
0.88 
0.05 
0.05 


ES 


1.00 


10. (CB) 


3 


CB 
CF 


0.96 
0.04 


CB 


1.00 


11. (GC) 


5 


CS 

ES 
GC 
CF 


0.10 
0.01 
0.81 
0.08 


GC 
CF 


0.65 
0.35 
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12. (PS) 3 CS 0.05 PS 0.99 

FS 0.02 
PS 0.93 



Average number of initial attributes 9.4 
Average number of test by program 7.1 
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With regard to "accuracy/' it can be seen that the lists of 
terminal decisions from the two experiments are identical and these 
decisions are the same as those of the pathologist in ten of the 
twelve cases. The major difference between the two sets of results 
is the average number of tests performed per diagnosis. In the first 
case this average is 30. (The average is less than 32 because some 
test results were not available or were not relevant for a given 
case and the test was not counted.) Sequential analysis of the given 
cases required an average of 16.7 tests per case. This average in- 
cludes 9.4 tests on the average to obtain the initial attributes. 
Thus, by employing sequential analysis, the program in each case 
obtained the same diagnostic decision as it obtained using all attrib- 
utes j but with only slightly more than half as many tests. 

The nature of diagnosis of bone tumors makes this saving seem 
immaterial. That is, almost all attributes are obtained from a 
radiograph, and once the radiograph has been obtained, the marginal 
cost of the tests considered here is essentially zero. One can 
easily imagine a situation, however, in which tests are completely 
independent of one another. In such a situation, the savings from 
sequential diagnosis might be quite significant. The fact that the 
performance of a diagnostician should be assessed in terms of both 
accuracy and cost favors the sequential mode of operation for the 
program. The question of how to assess the performance of a diag- 
nostician will be considered at greater length later. 
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Another difference between the results of the two experiments 
is found in the posterior distributions at the points of a terminal 
decision. The average value of the maximum likelihood probability 
for the terminal decisions can be taken as an indication of the 
equivocation or uncertainty in the average decision. For Experi- 
ment 1 this value is 0.85 while for Experiment 2, it is 0.80. 
Therefore, the sequential diagnoses terminate on slightly less 
"certain" decisions. 

Experiment 3 . Sequential Analysis — Simulated Case Histories 

Table 10 presents the results of the sequential diagnoses of 
ten simulated case histories. The generator function was used to 
develop the cases and the diagnostic program employed as usual. 
Again, all diseases were taken to be equally serious and all tests 
were assigned unit cost. 

Again, the marked advantage of sequential diagnosis is evi- 
dent. The average number of tests required for diagnosis was 17.0. 
Based on a maximum likelihood terminal decision, the diagnostic pro- 
grante terminal decision was correct in nine of ten cases. 

On the average, the diagnostic program was more certain of its 
terminal decisions than in the previous experiments (average proba- 
bility of terminal decision = 90.5). 
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TABLE 10 
Sequential Diagnosis of Simulated Case Histories 



Histological Number of Ini- 
Type tlal Attributes 



Number of Tests 
Selected by Program 



1. FS 

2. ES 

3. OS 

4. GC 

5. ES 

6. RC 

7. CB 

8. OS 

9. FS 
10. GC 



14 



11 
5 

12 



11 



11 



10 




11 



12 



Distribution at 

Point of Terminal 

Decision 

CS 0.26 
FS 0.73 

ES 0.88 
OS 0.01 
RC 0.11 

OS 1.00 

CS 0.01 
GC 0.79 
CF 0.20 

CS 0.01 
ES 0.94 
OS 0.04 

CS 0.05 
FS 0.78 
RC 0.16 

CB 0.93 
GC 0.02 
CF 0.05 

OS 0.98 
CS 0.02 

CS 0.11 
FS 0.88 

CB 0.04 
FS 0.01 
GC 0.94 
CF 0.01 



Average number of 

initial attributes 

9.1 



Average number of 

tests by program 

7.9 



Chapter 6 
DIAGNOSIS OF CONGENITAL HEART DISEASE 

A. The Nature of the Diagnostic Problem 

A prolonged study of a group of thirty-four types of congeni- 
tal heart disease has been conducted by Warner and his associates 
(R12, R13, R14). As a result of this study, they developed a 
disease-attribute probability matrix for thirty-five types (includ- 
ing "normal") and fifty-seven attributes. The attributes can be 
grouped into four main categories: murmurs, electrocardiogram find- 
ings, X-ray findings, and other symptoms and physical signs. The 
problem of diagnosing heart disease cases based on this matrix is 
more difficult than the bone tumor problem discussed in Chapter 5. 
One reason for the increased difficulty is simply the increased 
number of diseases. Also certain groups of diseases have quite 
similar attribute probabilities in the matrix. 

As noted in Chapter 2, Warner developed a computer program to 
perform diagnosis of congenital heart disease patients based on a 
Bayesian analysis of their signs and symptoms. His program employs 
the matrix mentioned above, but in addition it must account for cer- 
tain dependencies (such as mutual exclusion of signs or symptoms). 
From the performance measures presented in Chapter 2, It can be 
seen that Warner's program performs at the level of an experienced 
physician. 
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The experiments discussed here involved the use of the disease- 
attribute probability matrix prepared by Warner in the diagnosis of 
congenital heart disease. As before^ the matrix was the basis for 
each of the disease types and the appropriate attribute lists created. 
Twenty-eight tests were also defined for the problem. Dr. Warner 
provided nine case histories,, each with the correct diagnosis and 
the diagnosis obtained by his program. In this instance., the cor- 
rect diagnoses were determined by follow-up studies such as heart 
catheterization or autopsy. 

Table 11 presents the names of the thirty-five states of the 
information structure used in these experiments and the names of 
the corresponding diseases. Table 12 lists the attributes of the 
problem; and Table 13 the tests. 

B. Experiments in Congenital Heart Disease Diagnosis 
Experiment 4 . Diagnosis Based on All Attributes 

The first experiment tested the diagnostic capability of the 
program given all the known attributes for each of the actual case 
histories provided by Dr. Warner. The results of this experiment 
are summarized in Table 14. In each instance., the diagnostic pro- 
gram duplicated the results obtained by Warner's program for the 
given case history. (That is^ both programs arrived at the same 
posterior probability distribution given all attributes.) 
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TABLE 11 
Heart Disease Types 



States Diseases State s 

D01 Normal D18 

D02 Atrial septal defect D19 

D03 Atrial septal defect with D20 

pulmonary stenosis D21 

D04 Atrial septal defect with D22 

pulmonary hypertension 

D05 Atrio-ventricular communis D23 

D06 Partial anomalous pulmonary D24 

venous connection 
D07 Total anomalous pulmonary 

venous connection D25 

D08 Tricuspid atresia D26 

(without transposition) D27 

D09 Ebstein's anomaly D28 

D10 Ventricular septal defect with D29 

valvular pulmonary stenosis D30 
Dll Ventricular septal defect with 

infundibular pulmonary stenosis 

D12 Pulmonary stenosis, valvular, D31 

gradient ^ 40 mm. Hg. 

D13 Pulmonary stenosis, infundibu- D32 

lar, gradient ^ 40 mm. Hg. 

D14 Pulmonary atresia D33 
D15 Peripheral pulmonary stenosis 

D16 Pulmonary hypertension D34 
D17 Aortic pulmonary window 

D35 



Diseases 

Patent ductus arteriosus 
Pulmonary arterio-venous Fistula 
Congenital metral disease 
Primary myocardial disease 
Anomalous origin or coronary 
artery 

Congenital aortic disease 
Ventricular septal defect with 
pulmonary flow — 1.4 systemic 
flow 

Coarctation of aorta 
Truncus arteriosus 
Transposition 

Hypertrophic subaortic stenosis 
Absent aortic arch 
Ventricular septal defect with 
pulmonary flow =~ 1.4 systemic 
flow 

Ventricular septal defect with 
pulmonary hypertension 
Patent ductus arteriosus with 
pulmonary hypertension 
Tricuspid atresia with 
transplantation 
Pulmonary stenosis gradient 
-^ 40 mm. Gh. 
Ruptured sinus Valsalva 
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TABLE 12 
Attributes for Congenital Heart Disease 



Sign 



Meaning 



Sign 



SOI 


Age, less than 1 year 


S29 


S02 


Age, 1 year to 20 years 


S30 


S03 


Age, 20 or more years 


S31 


S04 


Cyanosis, mild 


S35 


SOS 


Cyanosis, severe (with 


S36 




clubbing 


S37 


S06 


Cyanosis intermittent 


S38 


S07 


Cyanosis differential 


S40 


S08 


Squatting 




S09 


Apex systolic 


S41 


S10 


Apex systolic, holo 


S42 


Sll 


Apex systolic, mid 


S43 


S12 


Apex diastolic 




S13 


Apex diastolic, early 


S44 


S14 


Apex diastolic, late 


S45 


S15 


L 4th systolic 




S16 


L 4th systolic, holo 


S46 


S17 


L 4th systolic, mid 


S47 


S18 


L 4th continuous 


S48 


S19 


L 4th diastolic 


S49 


S20 


L 4th diastolic, holo 


S50 


S21 


L 4th diastolic, early 


S51 


S22 


L 2nd systolic 


S52 


S23 


L 2nd systolic, holo 


S53 


S24 


L 2nd systolic, mid 


S54 


S25 


L 2nd continuous 


S55 


S27 


R 2nd systolic 


S56 


S28 


R 2nd diastolic 


S57 



Meaning 



Post systolic 

Post continuous 

Murmur louder than gr 3/6 (10 mm) 

Accentuated P2 

Diminished P2 

Fixed split P2 

Femoral pulse less than brachial 

Atrial fibrillation or broad 

notched P wave 

Axis, right (more than 110°) 

Axis, left (less than 0*) 

R wave greater than 1.2 mv in 

lead Vi 

rR' or qR in lead V\ 

R wave greater than 2.5 mv in 

lead V 6 

T wave inversion in lead V5 

Rib notching 

Peripheral vessels increased 

Peripheral vessels decreased 

Hilar vessels increased 

Hilar vessels decreased 

Main pulmonary artery large 

Main pulmonary artery not seen 

Aorta large 

Aorta small 

Cardiomegaly 

Snowman 



.■/:■ ;;■*;<- :;, -,^:«gy»=r(:.j™ - 
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TABLE 13 
Tests for Heart Disease Diagnosis 



Tests 

1. TEST1 

2. TEST4 

3. TEST8 

4. TEST9 

5. TEST10 

6. TEST12 

7. TEST13 

8. TESTIS 

9. TEST16 

10. TEST19 

11. TEST20 

12. TEST22 

13. TEST23 

14. TEST27 

15. TEST28 

16. TEST29 

17. TEST31 

18. TEST35 

19. TEST37 

20. TEST38 

21. TEST40 

22. TEST41 

23. TEST43 

24. TEST44 

25. TEST45 

26. TEST46 

27. TEST47 

28. TEST48 

29. TEST50 

30. TEST52 

31. TEST54 

32. TEST56 

33. TEST57 



Possible Results 



SOI 


, so2 , 


S03 


S04' 


, S05, 


S06, S07, N 


S08 


, N 




S09j 


, N 




S10, 


, Sll, 


N 


S12, 


, N 




S13 


, S14, 


N 


s15 j 


N 




S16; 


, S17, 


S18, N 


S19, 


, N 




S20 ; 


S21, 


N 


S22j 


> N 




S23 : 


S24, 


S25, N 


S27; 


, N 




S28 : 


, N 




S29j 


, S30, 


N 


S31, 


N 




S35, 


S36, 


N 


S36 : 


, S37, 


N 


S38, 


, N 




S40j 


, N 




S41 : 


, S42, 


N 


S43, 


N 




S44, 


N 




S45, 


N 




S46, 


> N 




S47, 


N 




S48, 


S49, 


N 


S50 


S51, 


N 


S52' 


, S54, 


N 


S54' 


S55, 


N 


S56^ 


N 




S57 


N 
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TABLE 14 



Diagnoses Based on All Available Attributes 
for Actual Heart Disease Case Histories 



Case. Posterior Distribution* 



3 

4 



D03 


0.91 


NORMAL 0.04 


D34 


0.03 


DOS 


0.84 


D02 


0.09 


D31 


0.03 


D04 


0.03 


D32 


1.00 


D20 


0.41 


D28 


0.38 


NORMAL 0.22 


D24 


0.04 


D34 


0.02 


Dll 


0.01 


D08 


0.94 


D33 


0.05 


D32 


0.98 


D29 


0.02 


D31 


0.47 


D30 


0.37 


D05 


0.08 


D02 


0.03 


D32 


0.02 


D30 


0.87 


D02 


0.12 


D31 


0.70 


D27 


0.20 


D26 


0.10 



Definitive Diagnosis 
D09 

D04 

D02 
NORMAL 



D33 



D32 



D31 



D30 
D27 



* Only diseases with probability greater than or equal to 0.01 

ar& oViot.tti 



are shown. 
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Experiment 5 . Sequential Diagnosis of Heart Disease Cases 

The actual heart disease cases were also diagnosed by the pro- 
gram using the sequential mode of operation. In each case, the 
initial attributes presented to the program were the results from 
a set of seven tests relating to physical signs. The diseases 
were assumed to be equally serious (1^ = 100, 000, i ^ j) and all 
tests were assigned unit cost. The search depth in the test se- 
lection function was one in each case. 

A summary of the results of this experiment is presented in 
Table 15. Again, the advantage of sequential diagnosis is appar- 
ent. The program required an average of 5.8 tests to obtain a 
diagnosis compared to the thirty-three tests required to determine 
all attributes. This small number of tests is interesting. Re- 
call the sequential diagnosis of the bone tumor cases required an 
average of 6.7 tests per case, although the problem involves only 
one quarter as many states as the heart disease problem. Several 
reasons might be advanced to account for this. First, the tests 
associated with heart disease may include a number which have little 
value in differentiating groups of diseases. Thus, in a given 
problem, the test selection function may choose a terminal decision 
after relatively few tests have been run. A second reason may be 
the relevance of more inter-attribute relationships in the heart 
disease problem. Such relationships may be quite useful in diagno- 
sis, but the testing sequences for them are not examined since the 
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TABLE 15 
Sequential Diagnosis of Actual Heart Disease Cases 



Case and 
Definitive 
Diagnosis 

1. D09 



Number of 

Tests Selected 

by Program 

10 



2. D04 

3. D02 

4. NORMAL 



10 



5. D33 

6. D32 

7. D31 

8. D30 



10 



Distribution 


Distribution 


at Terminal 


Based 


on all 


Decision 


Attributes 


NORMAL 


0.04 


NORMAL 0.04 


D02 


0.06 


D03 


0.91 


D03 


0.69 


D34 


0.03 


Dll 


0.02 






D18 


0.05 






D26 


0.03 






D34 


0.03 






D02 


0.08 


D02 


0.09 


D04 


0.17 


D04 


0.03 


D05 


0.62 


D05 


0.83 


D31 


0.10 


D31 


0.03 


D27 


0.03 


D32 


1.00 


D32 


0.96 






NORMAL 


0.07 


NORMAL 0.22 


D10 


0.03 


D28 


0.38 


Dll 


0.07 


D24 


0.04 


D12 


0.02 


D20 


0.41 


D20 


0.67 


D34 


0.02 


D24 


0.01 


Dll 


0.01 


D28 


0.10 






D08 


0.92 


D08 


0.94 


D33 


0.01 


D33 


0.05 


D32 


0.98 


D32 


0.98 


D29 


0.01 


D29 


0.02 


D04 


0.01 


D31 


0.47 


DOS 


0.09 


D30 


0.37 


D31 


0.86 


DOS 


0.08 


D32 


0.02 


D32 


0.02 


D02 


0.03 


D30 


0.87 


DOS 


0.02 


D02 


0.12 


D20 


0.01 






D30 


0.89 







9. D27 
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Dll 


0.02 


D31 


0.70 


D19 


0.01 


D27 


0.20 


D24 


0.06 


D26 


0.10 


D26 


0.06 






D31 


0.77 






D33 


0.03 







Average number of initial attributes = 7 
Average number of tests by program = 5.8 
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depth of the tree search is limited to one level. Unfortunately, an 
increase in the depth of search leads to prohibitive amounts of com- 
putation in the heart disease problem. A deeper search may be possi- 
ble if more powerful breadth-limiting heuristics are developed. 

On the whole, the performance of the program with sequential 
diagnosis is comparable to that when all attributes are available. 
The one apparent exception to this involves case 9. Here the se- 
quential diagnosis failed to assign a probability of greater than 
0.01 to disease D27. The seriousness of this failure depends on 
medical considerations which are not discussed here. The general 
problem of measuring diagnostic performance, however, will be dis- 
cussed in Chapter 8. 



Chapter 7 
FURTHER EXPERIMENTS WITH THE DIAGNOSTIC SYSTEM 

In order to explore the potential value of the diagnostic 
system as a tool for the study of a variety of diagnostic problems 
and strategies, some further experiments were performed. The re- 
sults of these experiments are reported in this chapter. 

Experiment 6 . The Effect of a Very Serious State 

In the experiments discussed in Chapters 5 and 6, it was 
assumed that the loss for misdiagnosis was the same for all pairs 
of diseases. For each experiment, the elements of the loss func- 
tion matrix were taken to be for l i;L and 100,000 for l i - l^j . 
For this reason, the diagnostic program always selected the most 
likely disease as its terminal decision. One can easily imagine 
situations, however, in which the assumption of a constant loss for 
misdiagnosis independent of the actual disease is unrealistic. For 
example, it may be far more serious to diagnose pneumonia as a com- 
mon cold than vice versa . Since the diagnostic program incorporates 
such considerations in its rules for selecting a terminal decision, 
changes in the loss function matrix can result in pronounced 
changes in its decisions. 
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This effect was observed in two different situations. In 
the first , the loss function matrix is presented in Table 16. Note 
that it is very costly to miss the diagnosis of CB. The misdiag- 
nosis of either CS or ES as a disease other than one of these two 
or CB is quite serious, but it is not particularly serious to diag- 
nose CS as ES or CB or ES as CS or CB. Failure to diagnose one of 
the remaining diseases results in a loss which is independent of 
the diagnosis made. 

The generator was used to generate seven case histories of 
bone tumor cases. Each case was diagnosed by the diagnostic pro- 
gram in the light of the new loss function. The results of this 
experiment are summarized in Table 17. From this table, it can 
be seen thdt the new loss function affects only one decision, that 
of case 3. In this case, the diagnostic program selected CB as 
the terminal decision in spite of the fact that GC (the actual di- 
sease) was more than three times as probable. The loss for diag- 
nosing CB as GC is 1,000 times that of diagnosing GC as CB, however, 
and this fact dominates the decision of the program. The relative 
seriousness of CB does not affect the diagnoses of the remaining 
cases because the observed attributes excluded CB as a possibility 
in each case. 

The effect of a serious disease on diagnosis can be made even 
more pronounced if the serious disease is not easily distinguished 
from other less serious ones. For example, the disease CS often 



TABLE 16 

Loss Function Matrix for Bone Tumor Diagnosis 
(in thousands) 
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Actual D 


isease 








Diagnosis 


CB 


CS 


ES 


FS 


GC 


OS 


PS 


RC 


CF 


CB 





0.1 


0.1 


1 


1 


1 


1 




1 


CS 


100 





0.1 


1 


1 


1 


1 




1 


ES 


100 


0.1 





1 


1 


1 


1 




1 


FS 


100 


10 


10 





1 


1 


1 




1 


GC 


100 


10 


10 







1 


1 




1 


OS 


100 


10 


10 




1 





1 




1 


PS 


100 


10 


10 




1 


1 







1 


RC 


100 


10 


10 




1 


1 


1 





1 


CF 


100 


10 


10 




1 


1 


1 


1 
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Case and 


Disease 


1. 


(PS) 


2. 


(GC) 



TABLE 17 

Sequential Diagnosis of Cases for Loss 
Function of Table 16 



Number of Ini- 
tial Attributes 

15 
8 



3. (GC) 9 

4. (ES) 10 

5. (ES) 8 

6. (OS) 13 

7. (GC) 8 



Number of 


Distribution 


Tests Selected 


at Terminal 


by Program 


Decision 


1 


PS* 1.00 


7 


GC* 0.90 




FS 0.09 




CS 0.01 


3 


CB* 0.24 




GC 0.76 





ES* 0.99 




RC 0.01 


2 


ES* 0.96 




CS 0.02 





OS* 1.00 


12 


GC* 0.89 




FS 0.09 




CS 0.02 



* Terminal decision by program. 
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appears in a terminal distribution when the actual disease is another. 
This means that CS has not been excluded as a possible diagnosis when 
a terminal decision is made. By making CS very serious relative to 
the other diseases , the decisions of the program can be strongly in- 
fluenced. 

The loss function matrix presented in Table 18 represents just 
this situation. A series of simulated cases was diagnosed by the pro- 
gram using this loss function. The results of this experiment are 
summarized in Table 19. Here the seriousness of CS dominates all 
decisions, and the terminal decision is CS in all cases. Note also 
that the terminal decision is made after relatively few tests have 
been run and while the posterior distribution is relatively diffuse. 
The predominance of terminal decisions for disease CS is a result 
of the seriousness of that disease. The decrease in the number of 
tests per case and the diffuse terminal distributions reflect the 
difficulty finding a single test which promises to significantly al- 
ter the expected loss. Since the diagnostic program employed a one 
level look ahead in searching the decision tree for these cases, it 
did not consider possible sequences of several tests to resolve this 
problem. This point will be discussed in more detail later in the 
thesis. 

The above example is but one in which the loss function has a 
significant effect on the terminal decisions made by the diagnostic 
program. Because the test selection strategy also accounts for the 
loss function, it, too, is affected by changes in the matrix. There- 
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TABLE 18 

Loss Function Matrix for Bone Tumor Diagnosis 
(in thousands) 











Actual Disease 








Diagnosis 


CB 


CS 


ES 


FS 


GC 


OS 


PS 


RC 


C 


CB 





1 


1 


1 


1 


1 


1 


1 




CS 


100 





1 


1 


1 


1 


1 


1 




ES 


100 


1 





1 


1 


1 


1 


1 




FS 


100 


1 


1 





1 


1 


1 


1 




GC 


100 


1 


1 


1 





1 


1 


1 




OS 


100 


1 


1 


1 


1 





1 


1 




PS 


100 


1 


1 


1 


1 


1 





1 




RC 


100 


1 


1 


1 


1 


1 


1 





1 


CF 


100 


1 


1 


1 


1 


1 


1 


1 






1*3 



Cass and 
Disease 



1. (FS) 

2. (CS) 

*. <cs) 

4. (OS) 

5. (CB) 

6. (GC) 



7. (OS) 



TABLK 19 

Sequential Diagnoses of Gas** 
for Loss Function of Tab Is 1$ 




14 



15 



3 
2 




CS* 0.56 


18 


0.02 


18 


0.34 


or- 


0;07 


CS* 0.96 


FS 


0.02 


E8 


0.02 


CB* O.li 


m 


**55 


QC 


0.03 


* 


0*82 


i€ 


0.30 


CS* 0.08 


08 


•0*91 


CS* 0,16 


G8 


0.21 


88 


0.03 


78 


0.11 


«C: 


0*48 


08*0.15 


C8 


0.04 


18 


0*07 


oc 


0.53 


;08- 


0.12 


PS 


0*06 


CSfeO.01 


08 


fc.88 


FS 


0.06 



* Terminal decision by program. 
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fore., an important facility in the study of diagnostic strategies 
for a particular application is the ability to assess the sensitivity 
of these strategies to the loss function. Although the current 
version of the diagnostic system restricts the loss function to a 
matrix form,, it is still possible to employ wide ranges of the 
values of the matrix elements in a given application study. This 
facility coupled with the capabilities of the generator makes it 
possible to study the performance of different versions of the diag- 
nostic program with a variety of matrix loss function. 

Experiment 8 . Studies of a Test-Selection Heuristic 

The experiments discussed in Chapters 5 and 6 indicate the value 
of sequential diagnosis in reducing the number of tests required for 
a diagnosis. Therefore., it is worth some effort to improve the opera- 
tion of the test-selection function. 

One problem which can arise in the use of the test-selection 
function of the current system is the appreciable amounts of com- 
putation required to evaluate all the relevant tests at a given de- 
cision node. it would be quite desirable to reduce the amount of 
computation devoted to test selection provided that the diagnostic 
capability of the program were not impaired. As an example of the 
amount of computation involved in test selection, consider the fol- 
lowing. In the diagnosis of congenital heart disease^ there can be 
as many as thirty-five states with non-zero probabilities in the 
current distribution. If there are twenty relevant tests at a given 
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decision node, each with two possible results, a one-level evaluation 
of these tests could require the creation of forty distributions, 
each requiring the computation of thirty-five updated probabilities. 
This is a significant amount of processing for a highly interactive 
program, and the example cited does not represent a particularly 
large set of alternatives. Since the test-selection function may be 
performed many times during a diagnosis, there is a good reason to 
reduce the time required to perform it. An obvious approach is to 
improve the efficiency of the code for the function. While this 
would no doubt lead to improvements, it was not attempted. Atten- 
tion was focused on attempting to reduce the number of tests con- 
sidered, rather than reducing the time devoted to the evaluation 
of an individual test. 

This approach was motivated by the results of the experiments 
with sequential diagnosis. There it was observed that relatively 
few tests were required for diagnosis by the program. The particu- 
lar set of tests employed for a given diagnosis is determined dynam- 
ically by the program, and varies from one diagnosis to another. If 
one could guess which tests would be relevant to a particular diag- 
nosis, the total number of tests considered could be reduced signifi- 
cantly. A guess about the relevance of certain tests must not be 
irreversible, however, because the value of some tests will become 
apparent only after other tests have been run. 

At any stage in a diagnosis, the current distribution provides 
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the most logical basis for a hypothesis about the future relevance 
of particular tests. One heuristic which incorporates this view is 
the one which restricts the set of tests considered to those which 
are relevant to the state which is the best terminal decision at the 
current node. This heuristic favors those tests which tend to con- 
firm or disprove the current "best guess" about the problem. It 
also had the property of reversibility mentioned above. When the 
terminal decision changes,, the set of relevant tests changes corres- 
pondingly. 

This heuristic was employed in a number of experiments with 
both congenital heart disease problems and bone tumor problems. In 
the cases studied it resulted in the same number of tests selected 
as the standard function which employs no such heuristic. This 
heuristic does reduce the average number of decision nodes considered 
per diagnosis. This reduction is not great, however, because in both 
problem areas the diseases share many attributes in common, and hence 
many relevant tests. Thus, at any decision node, almost almost all 
the tests are relevant to the state determined to be the best terminal 
decision. 

A second heuristic which offered a potentially greater reduction 
in the number of decision nodes considered per diagnosis was also 
considered. This heuristic employs the current distribution to "guess" 
which tests will not be useful in the remainder of the diagnosis. 
Tests which are thought to have little value are temporarily removed 
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from consideration. At a later point in the diagnosis these tests 
may be released for further consideration. 

The actual operation of this heuristic is as follows. At a 
given decision node, the set of relevant tests is evaluated by the 
test selection function. Then the set of tests is partitioned into 
two disjoint subsets. In the first are all those tests with the 
property that the sum of the cost of the test plus the expected loss 
of a terminal decision after the test has been run exceeds the ex- 
pected loss of the current terminal decision. These tests are said 
to be dominated . The second set consists of all the remaining un- 
dominated tests. The heuristic hypothesizes that the tests in the 
dominated set will remain dominated for the remainder of the diag- 
nosis. This set of tests is placed on the top of a push-down stack. 
At each decision node the push-down stack is examined prior to evalu- 
ating each test. If the test is found in the stack it is not con- 
sidered at the decision node. 

In general, then, each iteration of the test selection function 
produces a new set of dominated tests which are pushed onto the stack. 
This means the set of relevant tests is generally decreased at each 
stage of the diagnosis. Whenever there are no undominated tests at a 
given decision node (i.e. whenever the terminal decision is selected), 
the program releases the set of dominated tests (if one exists) on 
the bottom of the stack. This corresponds to re-evaluating those 
tests which were tentatively discarded earliest in the diagnosis. 
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The reason for this choice is that it is desirable to reconsider 
tests which were dominated when the distribution was quite different 
from the present one. If the distribution has changed little, tests 
which were formerly dominated are apt to be currently dominated. Ac- 
tually, there is no guarantee that this method will produce the de- 
sired effect. It is used primarily as an example of a possible ap- 
proach, and additional discussion will be devoted to the subject below. 

The "dominated-test" heuristic was tested in the sequential 
diagnosis of both the heart cases and bone tumor cases. The nine 
heart disease cases and the twelve bone tumor cases were used as the 
testing sample. The same initial attributes for a given case were 
given to both the "dominated-test" heuristic and the standard ver- 
sion of the diagnostic program. The number of tests by the program, 
the number of decision nodes considered during diagnosis, and the 
distribution at the terminal decision were all recorded. These re- 
sults are summarized in Tables 20 through 23. A number of these 
results have an interesting interpretation. 

In both the heart disease cases and the bone tumor cases, the 
dominated-test heuristic results in a substantial reduction in the 
average number of decision nodes considered per diagnosis. In the 
heart disease problem, this heuristic results in a larger average 
number of tests performed per diagnosis. In situations when the cost 
of an average test exceeds the value of the computation saved, this 
is an undesirable effect. The reason for this reduction in diag- 
nostic efficiency can be seen from the following interpretation of 



149 



TABLE 20 

Sequential Diagnosis of Heart Disease Cases- 
Standard Test Selection Function 



Number of 







Number of 


Decision 


Distribution 


Case and 


Initial 


Tests Selected 


Nodes 


at Terminal 


Diagnosis 


Attributes 


by Program 


Considered 


Decision 



1. D09 7 10 541 



2. D04 7 4 287 

3. D02 7 1 133 

4. NORMAL 7 10 523 



5. D33 7 3 248 

6. D32 7 66 

7. D31 7 10 513 



NORMAL 


0.04 


D03 


0.69 


D34 


0.03 


D02 


0.06 


D18 


0.05 


D26 


0.03 


D02 


0.08 


D04 


0.17 


DOS 


0.62 


D31 


0.10 


D27 


0.03 


D32 


0.96 


NORMAL 


0.07 


F10 


0.03 


Dll 


0.07 


D12 


0.02 


D20 


0.67 


D24 


0.01 


D28 


0.10 


D08 


0.92 


D33 


0.01 


D32 


0.98 


D29 


0.01 


D04 


0.01 


D05 


0.09 


D31 


0.86 


D32 


0.02 
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8. D30 7 8 457 



9. D27 7 6 379 



D02 


0.03 


DOS 


0.02 


D20 


0.01 


D30 


0.89 


Dll 


0.02 


D19 


0.01 


D24 


0.06 


D26 


0.06 


D31 


0.77 


D33 


0.03 



Average number of tests by program = 5.8 

Average number of decision nodes considered = 350 
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TABLE 21 

Sequential Diagnosis of Heart Disease Cases 
Dominated Test Heuristic 



Number 







of Tests 


Number of 


Distribution 


Case and 


Initial 


Selected 


Decision Nodes 


at Terminal 


Diagnosis 


Attributes 


by Program 


Considered 


Decision 


1. D09 


7 


11 


283 


NORMAL 
D03 
DOS 
D02 
Dll 
D18 
D26 
D34 


0.04 
0.70 
0.01 
0.06 
0.02 
0.05 
0.03 
0.03 


2. D04 


7 


5 


163 


D02 
D04 
DOS 
D31 


0.08 
0.16 
0.63 
0.03 


3. D02 


7 


1 


66 


D27 
D32 


0.03 
0.96 


4. NORMAL 


7 


16 


345 


NORMAL 0.50 










Dll 


0.02 










D15 


0.02 










D20 


0.24 










D28 


0.02 


5. D33 


7 


3 


176 


D08 
D33 


0.98 
0.01 


6. D32 


7 





66 


D32 
D29 


0.98 
0.01 


7. D31 


7 


11 


269 


D04 
DOS 
D30 
D31 
D32 


0.06 
0.14 
0.01 
0.71 
0.07 
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8. D30 7 10 301 



9. D27 7 6 216 



D02 


0.04 


D04 


0.02 


DO 5 


0.02 


D18 


0.03 


D20 


0.04 


D30 


0.70 


D31 


0.08 


D32 


0.02 


Dll 


0.02 


D31 


0.77 


D24 


0.06 


D26 


0.06 


D33 


0.03 



Average number of tests by program = 7 

Average number of decision nodes considered = 208 
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TABLE 22 

Sequential Diagnosis of Bone Tumor Cases 
Standard Test Selection Function 







Number of 


Number of 






Case and 


Initial 


Tests 


Decision 


Distribution 


Pathology 


Attributes 


Selected by 


Nodes 


at Terminal 






Program 


Considered 


Decision 


1. (GC) 


7 


9 


269 


GC 
CB 


0.78 
0.21 


2. (OS) 


10 


12 


42 5 


OS 
CS 


0.35 
0.65 


3. (CB) 


9 








CB 


1.00 


4. (CS) 


70 


4 


223 


CS 


0.99 


5. (OS) 


10 


4 


194 


OS 


1.00 


6. (RC) 


10 


13 


406 


RC 

ES 
FS 


0.68 
0.30 
0.01 


7. (CS) 


8 


4 


228 


CS 
FS 


0.78 

0.22 


8. (ES) 


8 


11 


475 


ES 
FS 
RC 


0.05 
0.07 
0.87 


9. (ES) 


6 


5 


278 


ES 
RC 
OS 
CS 


0.88 
0.05 
0.05 
0.02 


10. (CB) 


10 


3 


109 


CB 

CF 


0.96 
0.04 


11. (GC) 


10 


5 


169 


GC 
CS 
CF 

ES 


0.81 
0.10 
0.08 
0.01 


12. (PS) 


10 


3 


142 


PS 
FS 
CS 


0.93 
0.02 
0.05 



Average number of tests by program = 7.1 

Average number of decision nodes considered = 243 
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TABLE 23 

Sequential Diagnosis of Bone Tumor Cases 
Dominated Test Heuristic 



Number 







of 


tests 


Number of 


Distribution 


Case and 


Initial 


Selected 


Decision Nodes 


at Terminal 


Diagnosis 


Attributes 


by Program 


Considered 


Decision 


1. (GC 


7 




7 


151 


CB 
GC 


0.73 
0.26 


2. (OS) 


10 




17 


211 


CS 
OS 


0.66 
0.34 


3. (CB) 


9 










CB 


1.00 


4. (CS) 


10 




5 


148 


CS 
ES 
FS 
OS 


0.82 
0.09 
0.05 
0.05 


5. (OS) 


10 




3 


139 


OS 
ES 
CS 
RC 


0.92 
0.02 
0.03 
0.03 


6. (RC) 


10 




14 


218 


RC 

ES 


0.70 
0.29 


7. (CS) 


8 




4 


180 


CS 
FS 


0.74 
0.26 


8. (ES) 


8 




15 


294 


RC 

FS 
ES 


0.90 
0.05 
0.03 


9. (ES) 


6 




5 


137 


ES 
CS 
FS 
OS 
RC 


0.87 
0.03 
0.01 
0.04 
0.04 


10. (CB) 


10 




3 


97 


CB 
CF 


0.96 
0.04 
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11. (GC) 



10 



119 



GC 0.81 

CS 0.10 

ES 0.01 

CF 0.08 



12. (PS) 



10 



106 



PS 0.92 
CS 0.05 

FS 0.02 



Average number of tests by program = 6.6 

Average number of decision nodes considered = 150 
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the heuristic. 

This heuristic simulates to a certain extent the diagnostic 
strategy of one who seizes upon an initial view of the problem and 
later yields that view with considerable reluctance. Thus, the 
program makes a guess as to which tests will prove important at an 
early stage in the diagnosis, and thereafter restricts its attention 
to those tests as long as some appear to be useful. The difficulty 
is that the view on which the guess was made may not be an accurate 
one. Although the tests being considered may be of some value, 
there may be other tests, temporarily disregarded, which may be of 
greater value. Unfortunately, the heuristic is not sufficiently 
sensitive to changes in the current distribution, and it may cause 
relatively unfruitful paths to be pursued to an unnecessary extent. 
When it eventually abandons such a path and re -evaluates the formerly 
dominated tests, it may already have incurred unnecessary testing 
costs. The heuristic exhibits a "single-mindedness" which results 
in less than satisfactory performance. 

In the bone tumor cases, this heuristic reduced both the 
average number of decision nodes considered and the average number 
of tests run. Here its failing is a loss of accuracy. This effect 
is extremely interesting. Apparently in its pursuit of an informa- 
tive series of tests, the program succeeds in obscuring much of the 
information implicit in the initial attributes. As a result, when 
the undominated tests are finally released for consideration, the 
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current distribution is sufficiently altered that the program does 
not find additional tests worthwhile. This effect may be the cause 
of the results for case 1. Here the dominated test heuristic se- 
lected fewer tests in arriving at a less satisfactory diagnosis than 
the standard test selection function. 

While the heuristic in question has some shortcomings, it does 
indicate a certain amount of promise. What it seems to lack is an 
awareness of changes in the current distribution which should cause 
certain dominated tests to be released for consideration. One possi- 
ble solution is to save the current distribution with a set of 
dominated tests. This would allow the program to compare the pres- 
ent distribution with one in the stack to determine whether the view 
of the problem has changed sufficiently to warrant the release of 
the tests. This comparison could also account for the relative 
seriousness of states in deciding whether a given change were sig- 
nificant . 

This example is but one of a number of heuristics which can be 
studied in the diagnostic system. Because very large decision trees 
may be encountered in future applications,, a variety of tree-pruning 
heuristics should be studied. 

Experiment 9 . Exercise of the Pattern-Sorting Capability 

A small example was constructed with which the pattern-sorting 
capability could be tested. This example consisted of six states 
and fifteen attributes. The matrix for the example is presented in 
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Table 24, The states in this example can be partitioned into two 
sets which have the property that certain attributes are specific 
to the states in a group and other attributes are shared by the two 
groups. The generator was employed to simulate case histories with 
noise attributes. That is ; a case history for a state in the first 
group included one or more attributes selected from those specific 
to the states in the second group. 

Consider the following diagnostic problem with the loss func- 
tion as specified in Table 25. The initial attributes are S10,, S12j 
S13 and S04. These attributes cannot be attributed to a single state^ 
and so the pattern-sorting function produces more than one pattern. 
In this case the patterns formed are (S04) and (S10, S12, S13) . For 
each of these patterns the distribution over states is obtained as- 
suming that the given pattern is the only one. These distributions 



1) (S04): DONE 0.24 2) (S10, S12, S13) : DFOUR 0.42 

DTWO 0.11 DFIVE 0.02 

DTHREE 0.65 DSIX 0.57 



Based on these distributions } the pattern-sorting function selects 
the current pattern. Here the choice is pattern 1 although it con- 
tains only one attribute. From the loss function matrix^ it can be 
seen that state DONE is very serious. Since state DONE can exhibit 
S04j the posterior probability of DONE given S04 is non-zero (0.24). 
By considering both posterior probabilities and losses,, the pattern- 
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TABLE 2 5 

Loss Function Matrix for Six State Problem 
(in thousands) 





DONE 


DTWO 


DTHREE 


DFOUR 


DFIVE 


DSIX 


DONE 





1 


1 


1 


1 


1 


DTWO 


100 





1 


1 


1 


1 


DTHREE 


100 


1 





1 


1 


1 


DFOUR 


100 


I 


1 





1 


1 


DFIVE 


100 


1 


1 


1 





1 


DSIX 


100 


1 


1 


1 


1 
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sorting function selects pattern 1 as the more serious, and hence 
it becomes the current pattern. Tests are selected relative to this 
pattern, but any new attributes are processed through the entire pat- 
tern stack as discussed in Chapter 4. In this particular example, the 
program continued diagnosis until the following situation was obtained: 

1. (S02, S04, NOT S06, S07, NOT SOS, NOT S09) 

DONE 0.92. 
DTHREE 0.08 

2. (S10, S12, S13, S02) 



DFOUR 0.62 
DFIVE 0.01 
DSIX 0.37 



The program then tentatively attributed pattern 1 to state DONE. 
This left S10, S12, and S13 unaccounted for. At this point., the 
user terminated the diagnosis. Had he wished, he could have pursued 
the investigation, the original pattern was shown to be invalid, the 
attributes in it would be returned to the unaccounted-for set and 
the pattern would be removed from the stack. 

A variety of such experiments were run with the pattern-sorting 
function and the results indicated that the particular scheme embodied 
in the function exhibits the desired properties. This function needs 
to be studied more extensively, however, especially in more complicated 
situations. Although this area was somewhat slighted in this research 
the environment provided by the diagnostic system should be a good 
one in which to pursue such a study. 



Chapter 8 
DISCUSSION OF THE RESEARCH 

The research discussed in the preceding chapters suggests 
a number of questions and issues which merit additional comment. 
In this chapter an attempt is made to draw together a number of 
results and to consider their potential generality. Also of interest 
here are some of the possible extensions of this research which aim 
at developing a more sophisticated system for the study and perform- 
ance of diagnosis. 

One of the more obvious questions involves the evaluation of 
the performance of the current diagnostic program. This question 
is important for two reasons. First 3 one of the principal hypothe- 
ses considered in this research was that in a variety of problem 
areas, a computer program could prove a competent or superior diag- 
nostician. The current program has been applied to a number of 
cases, simulated and actual, of bone tumor and congenital heart di- 
sease. Hence a reasonable question is how well did it perform. A 
second reason for establishing a meaningful performance measure is 
so that it can be used in studies of various diagnostic strategies. 
If one test selection heuristic is to be judged superior to another, 
the judgment must be based on a measure of performance, and that 
measure should reflect diagnostic capability. So there is a very 
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real need for a good measure of diagnostic performance. 

Unfortunately, while the need for a performance measure is clear, 
the precise nature of such a measure is open to a number of ques- 
tions. Perhaps the best way to approach the problem is to catalog 
those qualities for which a diagnosis is generally judged to be a 
good one. The most obvious of these qualities is the accuracy of the 
diagnosis. The object of diagnosis as stated in the beginning of 
this thesis is to ascertain the state of a system. All other things 
being equal, the more accurate the determination of the state of the 
system, the better the diagnosis. By itself, however, this quality 
has relatively little meaning. One desires to know the state of a 
system in a diagnostic problem because this knowledge is an input to 
a subsequent decision (e.g. the decision about a treatment plan for 
a medical problem) . Accuracy is not sought for its own sake, but 
rather for its improvement of decisions which result from the diagno- 
sis. If these latter decisions are independent of any particular al- 
ternative in a group of diagnostic decisions, then there is no bene- 
fit to be accrued from distinguishing one of this group from another. 
From the point of view of further decisions, the states corresponding 
to these decision alternatives constitute an equivalence class. If 
a doctor knows that a patient has one of three viruses, all of which 
would be treated in the same manner, there may be no value attempt- 
ing to deduce the "actual" virus. 

If one were interested in accuracy as the chief quality of good 
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diagnosis, he could contend that in the above example , the doctor 
was accurate in diagnosing the problem as one of three viruses and 
that this can be thought of in identifying the state of the patient. 
A simple extension of this example makes this objection less forceful, 
however. Suppose that each of the three viruses are treated in a 
different manner and that there is a loss of diagnosing any one as 
another/ but in each case this loss is less than the testing loss 
required to distinguish one from another. Again the identification 
of the goal of diagnosis as accuracy seems incomplete. The point is 
that accuracy is sought only to an extent commensurate with the ex- 
pected consequences of a diagnostic decision about the system and 
the expected cost of obtaining greater accuracy. 

This view of the diagnostic process has been the basis for this 
research. From the point of view of the diagnostician, the goal 
of diagnosis is to minimize the sum of the testing loss and the ex- 
pected decision loss. Conceivably a diagnostician could correctly 
ascertain the state of a system at such a testing cost that his diag- 
nosis would be judged inferior. 

While it is appropriate for a diagnostician to consider expected 
loss for misdiagnosis as a factor in determining the course of a 
diagnosis this quantity is not necessarily relevant to the judgment 
of his diagnostic performance. The principal reason for this is that 
the expected loss depends on the probability distribution over states 
which is held by the diagnostician at the time of a terminal decision. 
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Since the diagnostician chooses tests, this distribution reflects 
his testing strategy as well as the actual problem. Basing a per- 
formance measure on expected loss ignores the relative merits of dif- 
ferent testing strategies. It is as though a doctor were to be given 
a high performance rating simply because he believed very strongly 
that he had discovered the patient's problem. This strong belief may 
well be founded on incomplete or irrelevant information. 

A more satisfactory way of assessing diagnostic performance is 
to simply add the testing loss to the actual decision loss. That 
is, judge the act rather than the intent. Ideally, one could deter- 
mine the actual decision loss by comparing the actual state of the 
system (when it becomes known) with the diagnostic decision and 
determining the loss attributable solely to the difference between 
the two. By this standard, a diagnostician who consistently mini- 
mized the sum of testing and decision losses would be judged to be 

superior. Some of the problems inherent in this measure are rather 
obvious. First, the actual state of the system may never be known 
with certainty. A patient who is diagnosed and treated may never 
return for further examination, and hence a serious misdiagnosis may 
never be uncovered. A second problem is the difficulty in appor- 
tioning the decision loss to various diagnostic decisions. Also, 
the loss itself may be very difficult to ascertain. Nonetheless, 
this measure does seem to subsume the desired properties, and al- 
though it may be difficult to apply, it does seem to be a standard 
to be sought. 
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Another consideration in evaluating diagnostic decisions couched 
in terms of probabilities is the interpretation of probability dis- 
tributions. For example, what are the implications of a diagnosis 
of (0.75, 0.25) for the states SI and S2 for a performance measure? 
To a large extent, it depends on the actions which are taken based 
on this diagnosis. Suppose the actual state is S2. How does this 
affect the evaluation of this diagnosis? If only a single action 
can be taken on this diagnosis and it is based on the belief that the 
state is SI, the problem is even more difficult. The influence of 
such a distribution on a human decision maker may be quite subtle. 
If individuals react differently to such distributions, the problems 
will be compounded. 

Finally, some effort should be made to normalize performance 
measures. Certain problems may be inherently more difficult to 
diagnose than others. For this reason, it is important to obtain 
an understanding of the limitations placed upon even the most expert 
diagnostician by the very nature of the problem before him. 

The evaluation of the performance of the diagnostic program 
in the particular problem areas of bone tumors and congenital heart 
disease is made more difficult by the lack of well-defined loss 
structure for these problems. This precludes the use of the total 
loss measure discussed above. An alternative approach is to compare 
the program performance with standards based on the performance of ex- 
perienced doctors. Even this approach is somewhat indirect in this 
case. Since no studies of doctor performance with the particular 
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case histories used were performed, no immediate comparisons based 
solely on the results of this research are possible. Some indica- 
tion of program performance, however, can be obtained in the follow- 
ing way. The problems of bone tumor diagnosis and heart disease 
diagnosis have been studied extensively by Lodwick and Warner res- 
pectively. Both developed computer programs to perform diagnosis 
and have compared the performance of these programs with that of ex- 
perienced physicians. These comparisons suggested that the programs 
performed diagnosis of a quality comparable to that of an experienced 
physician when all attributes were presented to both physician and 
program. The fact that the current diagnostic program duplicates 
the results of these programs on the cases studied suggests that the 
current program would fare equally well in a comparison with physicians. 
In the absence of a performance measure, this is the strongest state- 
ment which the experimental evidence will support. 

If one tentatively accepts this suggestion, then a second sig- 
nificant conclusion can be derived from the results of these experi- 
ments. The diagnostic program was able to solve problems in two 
different areas of medical diagnosis. These areas differ in both 
the number of diseases and the complexity of inter-attribute rela- 
tionships which are considered. The latter aspect is particularly 
important because it was handled without changing the program. Since 
the experiments involved only two problem areas and both were medical, 
the applicability of the program for a wide class of problems has 
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not been established. Its success in the two areas mentioned., 
however^ strengthens the belief that it does have wider applica- 
bility. 

The fact that the program is independent of the content of 
the information structure might be of significant value in the use 
of the program with hierarchical structures. Consider., for example^ 
the problem of diagnosing a very large set of diseases. One possi- 
bility would be to create a hierarchical structure in which many 
sub-structures exist. The structures for bone tumors and congenital 
heart disease might such sub-structures. At the higher levels^ 
the states would be classes of diseases,, such as heart disease. 
The goal of diagnosis at higher levels would be to determine the 
proper class of disease. When this determination had been made,, a 
more detailed sub-structure for that disease class would be employed 
for a "finer" diagnosis. The same diagnostic program could deal 
with all sub-structures. This would be a great improvement over a 
large set of programs^ one for each sub-structure. 

Again, considering the results of diagnosing actual case his- 
tories, one can readily appreciate the advantage of sequential diag- 
nosis. In the particular problems studied, the program was able to 
arrive at a diagnosis with the use of relatively few tests. This 
capability is very important since the testing cost for a diagnosis 
may be a significant part of the total cost. Tests which are un- 
necessary or uninformative may exact a high price, and an effort 
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should be made to restrict the tests run to those essential to the 
diagnosis. The sequential test selection facility permits the pro- 
gram to dynamically assess the potential usefulness of each possible 
test. This results in efficient testing strategies, an important 
component of good diagnosis. 

In a problem area in which the tests relevant to different 
groups of states are relatively disjoint, the value of sequential 
testing should be even greater. Once the appropriate group of states 
has been established, the tests considered can be restricted to the 
set of tests associated with that group. In the absence of a sequen- 
tial testing capability, it may be necessary to perform all tests to 
obtain information which could have been obtained from a few. The 
striking reduction in the number of tests required for diagnosis of 
bone tumors and congenital heart disease effected by sequential testing 
strongly suggests the potential value of this approach in other diag- 
nostic problems. 

The existence of a diagnostic system rather than just a diagnos- 
tic program has proved quite important in this research. Many of 
the strategies which were considered are quite complicated, and it is 
difficult to predict a priori the manner in which they will perform. 
The generator has been very useful in testing these strategies under 
a variety of problem conditions. Also of use has been the facility 
for selectively monitoring particular diagnostic functions such as 
pattern-sorting and test selection by collecting detailed data on 
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their operations. 

One virtue of the inclusion of a generator in the diagnostic 
system is that it makes it possible to study the performance of the 
diagnostic program in problems derived from a wide range of informa- 
tion structures. The simulation capability frees the researcher 
from dependence on actual case histories. Thus he can create struc- 
tures and simulated cases specifically designed to test some aspect 
of the diagnostic program. The use of the simulation facility with 
an information structure corresponding to an actual diagnostic 
problem may also be very useful in the study of that particular 
problem. 

Complementing this capability is that of operating the diagnos- 
tic program in an interactive mode. Thus a user can employ the pro- 
gram in actual diagnostic problems. This "open end" of the system 
permits the independent testing of strategies developed through re- 
search, as well as making the diagnostic program a practical aid to 
problem solving. The experience gained in this research indicated 
the value of such a system which permits the study of both actual 
and artificial diagnostic problems. It seems that this type of 
system would prove must useful in further development of sophisti- 
cated strategies for computer-aided diagnosis. 

Finally, the modularity of the system is very important. On 
the one hand, the insulation of the system functions from one another 
permits one to study a wide variety of diagnostic strategies since 
the functions can be changed independently of one another. Also as 
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better versions of these functions are developed, they can be incor- 
porated into the system without restructuring it. In this sense,, the 
performance of the system can be improved as additional experience 
with it is obtained. 

The experience obtained with the diagnostic system has pointed 
to a number of areas for further research. A number of these areas 
are discussed here. Some pertain to specific improvements in the 
diagnostic capabilities of the program, while others have more gen- 
eral ramifications. 

In Chapter 7 9 certain experiments to study the effect of the loss 
function on diagnosis were discussed. While these experiments are by 
no means exhaustive, they do indicate the strong effect the loss 
function can exert on diagnoses obtained by the program. Two major 
questions need to be investigated in this regard. The first is how 
such a loss function can be developed for a particular problem area, and 
the second is in what ways is diagnosis sensitive to the actual values 
of a loss function. 

The first question is a very difficult one to answer. Assuming 
for the moment that the matrix form of the loss function is retained, 
the problem is to determine the "seriousness" of each possible mis- 
diagnosis in some appropriate units. For example, in the context of 
medical diagnosis, one must answer questions such as "How serious is 
the diagnosis of pneumonia as influenza and vice versa ?" This answer 
must be in such terms as to permit the comparison of a wide variety 
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of misdiagnoses in an orderly manner. If one considers the extreme 
range of consequences resulting from misdiagnoses in medicine, he 
can appreciate the magnitude of this task. As stated, the problem 
required the establishment of a common scale for such extremes as 
the failure to diagnose a simple cold and the failure to diagnose 
cancer. 

In many instances , the loss for a misdiagnosis depends on many 
extraneous factors, such as whether a patient will return to the doc- 
tor when his symptoms persist. The loss may also depend on decisions 
made after the diagnosis which are difficult to predict. Compound- 
ing the problem of the loss function is the need to convert the test- 
ing loss to the same scale. In particular areas, one may be confronted 
with further complications in this regard. For example, the question 
of a loss function for medical diagnosis is also a question of whose 
loss function should be employed. One could answer that the loss 
function should be that of the patient. The loss function of the doc- 
tor, and that of society, however, are also possible answers to this 
question. If a diagnostic system were created for general use in 
medical diagnosis, questions such as these would have to be considered. 

Although the problems of determining the loss function for 
an area as complex as medical diagnosis would be very great, they 
may well prove worth the effort of solution. If the value of a pro- 
gram for diagnosis in a given area can be clearly demonstrated to 
be considerable, this would be strong motivation for work on an ap- 
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propriate loss function. As currently conceived, such a diagnostic 
program would make extensive use of losses in directing a diagnosis. 
These losses should reflect the best understanding of the conse- 
quences of possible decisions. In some areas, the development of a 
loss function might be a valuable exercise independent of the im- 
plementation of a diagnostic program. In areas where sophisticated 
diagnosis is currently being performed by human beings, a loss func- 
tion is often implicit. The attempt to quantify this loss function 
may reveal inconsistencies and reveal implicit losses of questionable 
merit. To the extent that this situation obtains in a particular 
area, there is additional motivation for research into this problem. 

Such research would involve investigation of means of quantify- 
ing and scaling diverse consequences as well as considerations of the 
best form which the loss function should take. To a large extent , a 
framework for these investigations has already been established. A 
number of workers in the areas of statistical decision theory, game 
theory, and economics (R21, R22) have considered many of the prob- 
lems associated with the attempt to scale decision alternatives. 
While this work is far from complete, it does provide a reasonable 
basis for some of the initial studies. This whole area is rich with 
problems of interest and importance. 

Another important area for research is the development of a 
diagnostic program which includes improved solutions to a number of 
different problems, some of which are discussed here. 
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As previously noted, the test selection function merits particu- 
lar attention. This function serves a central purpose in the over- 
all diagnostic strategy of the program, and as a result, significant 
improvements in this area would directly promote the diagnostic capa- 
bility of the program. More sophisticated test selection heuristics 
are required if the program is to deal successfully with problems 
involving large numbers of decision and testing alternatives. All 
the test selection heuristics employed in this research is "fixed- 
depth" in the sense that they explore all branches away from a given 
decision node to a fixed depth in the decision tree. Most likely a 
better test selection function would explore branches to varying 
depth, pursuing further those branches which appeared mote promis- 
ing. The difficulty yet to be overcome in this regard is the es- 
tablishment of some measure of 'promise" for branches in the decision 
tree. This problem has been encountered in other applications of 
heuristic programming, and it can be expected that significant re- 
sults in the diagnostic problem would be of more general applica- 
bility. Similarly, if powerful test selection heuristics can be 
developed, they might be of considerable value in a variety of 
sequential decision problems. 

Another improvement to the diagnostic program would allow it 
to take advantage of various relationships among tests. For example, 
if one is going to perform a certain test, it may be advantageous 
to perform another test as well because it is inexpensive when run 
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in conjunction with the first test. The inclusion of more complete 
information about tests in the information structure might allow 
the program to exploit various inter-test relationships and to select 
groups of tests to be run during diagnosis. 

The pattern-sorting function needs to be bolstered by the addi- 
tion of facilities for assessing the accuracy of the attributes 
provided it by the user. Just as it is important to detect noise 
attributes, it is equally important that the presence of false in- 
formation be discovered. Undoubtedly only partial solutions to this 
problem are possible, but additional capabilities of this kind, 
even if somewhat limited, would be of considerable value in appli- 
cations of the program to actual diagnostic problems. For example, 
the program could include a means for incorporating estimates of 
the reliability of tests into both the pattern-sorting and inference 
functions . 

A number of improvements can be made in the inference function 
of the program. One of these is the incorporation of a learning 
scheme within this function. Such a scheme would permit the pro- 
gram to learn the a priori probabilities for the various states as 
well as the conditional probabilities of attributes of given states. 
Bayesian framework provides a convenient structure within which a 
learning scheme can be developed. Learning of this type is especially 
important if the relevant probabilities vary with the specific appli- 
cation. For example, if the information structure for congenital 



176 



heart disease were employed in a region of the country other than 
that in which it was developed, the probabilities might require ad- 
justment to reflect changes in the characteristics of the popula- 
tion of potential patients. The program can obtain the information 
required for such an adjustment from the actual diagnoses which it 
performs on patients from the new population provided that other 
means of obtaining diagnoses are available. Thus in certain appli- 
cations^ the diagnostic program may require a training period in 
which it can alter the contents of the information structure to more 
accurately reflect the relevant behavior of the given system. A 
variety of learning schemes should be investigated to develop a 
scheme which will be suited for this problem. 

Some of the considerations involved in research of this kind 
are apparent at the outset. If the probabilities of interest are 
relatively stable, then a rather prolonged learning period may be 
acceptable in the hope that these probabilities will be learned accu- 
rately. On the other hand, if the probability structure of the 
problem is relatively dynamic, then more rapid learning may be re- 
quired. One difficulty with the latter situation is that rapid 
learning implies a greater weighting of recent experiences and if the 
environment is noisy, this may lead to poor probability estimates, 
and hence to poor diagnosis. One possibility is to exploit the 
ability of the human diagnostician to perceive patterns and trends 
by allowing him to influence probability estimates dynamically. For 
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instance, a doctor might be better able to detect the early stages 
of an epidemic and hence adjust the a priori probability of the 
prevalent disease to reflect its increased incidence. 

Some Comments on the Diagnostic Model 

When one devotes considerable attention to the problem of diag- 
nosis , he may experience a tendency to generalize his definition of 
the problem so as to encompass an increasingly wide circle of prob- 
lems. The danger of this tendency is that it may result in the ex- 
tensive discussion of diagnostic programs and systems of impressive 
capabilities which are founded more on wishful thinking than on 
practical experience. Because the appeal of such an intellectual ex- 
ercise is strong, it is important to consider carefully the model of the 
diagnostic problem being employed in order to obtain a realistic view 
of both its potential and limitations. Some of the important charac- 
teristics of the model employed in this research are investigated 
here with this intention. 

A diagnostic model based on attribute-state relationships has 
understandable appeal. In many diagnostic problems the most visible 
aspect of an expert's attack on a problem is his gathering of attrib- 
utes on which to base his decision. In many instances he may appear 
to relate these attributes directly to the possible states of the 
system. When the difficulty of diagnostic problems in general is 
considered, however, it seems unlikely that the human expert per- 
forms only a simple association of attributes and states to arrive 
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at a diagnosis. Diagnosis, as performed by humans ; seems to be a 
subtle and often complex process of association and deduction. 

The model employed in this research, on the other hand, is 
very explicit in the way in which it relates attributes and states. 
Associations in the information structure are relatively direct, 
and deduction is performed in a uniform manner for all problems. 
In one sense, the model employed by the diagnostic program appears 
quite rigid and simple. Even this brief comparison with human diag- 
nosis suggests an important question. Can this relatively simple 
model be sufficient for a diagnostic program to perform effectively? 
A derivative of this question is the following. To what extent can 
a program based on this model be successful in performing diagnosis 
in a variety of problem areas? Although the evidence gathered from 
this research is far from sufficient to allow definitive answers to 
these questions, it does permit some insights into the problems to 
which these questions are addressed. 

The author believes that the basic functions developed in this 
work reflect aspects of a diagnostic program which has both potential 
generality and power. At present, the functions are quite crude in 
their structure and capabilities, but the conception of diagnosis in 
terms of these functions (or their more sophisticated successors) is 
believed to be both a useful and viable one. One problem may be that 
the current separation of functions is somewhat restrictive, but this 
has the advantage of emphasizing the principal objectives and problems 
of each. This emphasis is very important in the initial phases of 



179 



research in this area, and the separation permits the study of differ- 
ent versions of one function more or less independently of the others. 

In broad outline, the model incorporates the principal features 
of diagnosis as performed by human beings. The inference function 
coupled with the information structure allows the consideration of 
both past experience and current information in a particular diagnosis. 
Bayesian inference provides an orderly way for balancing these two 
elements in the deductive process. The test selection function pro- 
vides the program with a rational means for choosing tests which 
accounts both for their cost and their potential value in furthering 
the diagnosis. Finally, the pattern-sorting function provides a 
means for performing diagnosis in the presence of noise. 

While it is unlikely that the human diagnostician employs this 
particular division of the diagnostic function, the total capability 
incorporated in the functions seems to approximate that required. 
It is also important to note that there is no particular reason to 
require a diagnostic program to simulate the processes employed by 
humans. A more appropriate requirement is that a diagnostic pro- 
gram should allow the exploitation of the comparative advantages of 
a computer in order that the total diagnostic capability of a man- 
machine partnership may exceed that attainable by either above. 

For example, it has been noted that doctors do not organize 
their diagnostic experience into large lists of symptoms and diseases, 
but rather associate their experience with and through their under- 
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standing of the human body and its processes. It would be extreme 
to conclude from this that such an organization is a necessary one 
for diagnosis, particularly if the diagnostician is a computer pro- 
gram. The fact that a doctor does not order his experience primarily 
in terms of attribute-disease lists may simply be evidence of the 
difficulty he encounters in attempting to deal with and maintain 
information of this form. A computer program would have less of a 
problem in this regard, and, in fact, this may be a useful structure 
to impose on the experience employed by a diagnostic program. 

While in very general terms, the functions of the program corres- 
pond to those apparently required for diagnosis, there remain cer- 
tain questions about limitations arising from their current realiza- 
tions. In a sense these are questions about the generality of the 
model. Since the program was designed to solve the model diagnostic 
problem, it is reasonable to expect that the generality of the pro- 
gram will be determined by the extent to which real diagnostic prob- 
lems can be described by the model. (Also, the appropriate statis- 
tical data must be available.) 

For example, a major difficulty in applying the program to pro- 
gram debugging is developing a proper characterization of states. 
One can see in theory how this can be accomplished, but a practical 
solution would be extremely difficult. Also, an extremely useful 
strategy in program debugging is changing the state of the program (by 
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changing instructions, etc.) Here tests may very well change the 
state of the system. Because one can save a copy of the program, 
one can also use destructive testing. While one could probably 
change the model (and program) to reflect these possibilities, the 
current model does not account for them. Hence, the use of the pro- 
gram in this area is severely limited. 

Also, there may be areas in which the diagnostic experience 
may not fit the statistical model employed in this work. In these 
areas, the inference function would have to be redone for non-Bayes- 
ian inference . 

On the other hand, there seem to be a number of real problems 
which can be described by the model, including many machine failure 
and medical diagnosis problems. While the evidence is limited, the 
performance of the current diagnostic program in the areas of con- 
genital heart disease and bone tumors should not be overlooked. At 
the very least these results must be termed promising. The model on 
which the program was based and the program itself were developed 
independently of considerations of these particular diagnostic prob- 
lems, and yet the program demonstrated potential value in both areas. 
There seems reason to believe that other problems of medical diagno- 
sis will also prove susceptible to such a program. The diagnostic 
system permits the study of alternative strategies developed in the 
light of such experiments, and this, too, should ease the problems 
of increasing the extent of its capabilities. 



182 



Some of the difficulty in applying the program to new areas 
can be traced more directly to a lack of adequate data for an in- 
formation structure than to an inherent intractability to this ap- 
proach. If continued research yields further indications of the 
value of a computer program for diagnosis, it may well be worth the 
considerable effort required to reformulate a number of diagnostic 
problems in terms of this model or an extension of it. Certainly, 
the results of this research do not preclude this possibility. 
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Appendix 1 
Sample of an Input File 
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(STATE D01 0.05 SOI 0.01 S02 0.10 . . . S17 0.90) 
(CLUSTR DOl EXOR 0.05 S06 0.07 S07) 



(ATTRIB (SOI S02 S03) TEST1 S04 TEST4 ... (16 S17) TEST16) 



(TESTS TEST1 10. . . . TEST16 15.) 
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Appendix 2 
Trace of a Session with the Diagnostic Program 
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User responses = small letters 
Program responses - capital letters 



1. r system 

2. NAME OF DIAGNOSTIC AREA PLEASE 

3. bone tumors 

4. NAME OF LOSS STRUCTURE FILE 

5. bone losses 

6. INFORMATION STRUCTURE ESTABLISHED 

7. generate brief 

8. YOU OR ME 

9. me 

10. HISTORY FILE 

11. bone case 

12. CODES 

13. 3 2 2 2 3 2 3 

14. NEW CASE 

WHAT ARE THE INITIAL ATTRIBUTES OF THE PROBLEM. Q. 

15. s05 s07 sll sl4 sl7 s20 not s21 

16. CONDITIONAL PRIOR STATE PROB 

CB 0.26 
CS 0.09 
GC 0.62 
CF 0.02 
TRACE 0.01 

17. ANY IDEAS. Q. TYPE 'DONE' IF SATISFIED. 

18. c.r. 
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19. SET SEARCH DEPTH^ THRESHOLD^ AND HEURISTIC CONTROL 

20. 1 0.10 

21. THE TEST SELECTED IS TEST43 

22. s43 

23. CONDITIONAL PRIOR STATE PROB 

CB 0.55 

CS 0.04 

GC 0.37 

CF 0.04 



24. THE TEST SELECTED IS TEST50 

25. s50 

26. CONDITIONAL PRIOR STATE PROB 

CB 0.21 

GC 0.78 

TRACE 0.01 

27. GC TENTATIVE DIAGNOSIS FOR THIS PATTERN 

28. CONSISTENT DIAGNOSIS FOR ALL ATTRIBUTES 
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Notes 



A. Line 7 through line 14. The user sets controls for the run. 
These controls include a history file and instructions as 

to what information is to be collected in this file during 
the run (line 13). 

B. Line 15. These are the initial attributes of the problem. 

C. Line 16. The inference function reports the current dis- 
tribution. 

D. Lines 17 and 18. The user is given the option of testing 
his hypothesis about the problem. He declines this option 
(line 18). 

E. Lines 19 and 20. Here the user sets the depth and threshold 
for the test selection function. He also chooses the stan- 
dard version of this function. 

F. Lines 21 and 22. The program selects a test and the user 
responds. This dialogue continues through line 25. 

G. Line 27 and line 28. The program makes a terminal decision 
for the pattern. This decision accounts for all attributes and 
the case is completed. 
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Appendix 3 
Listings of Diagnostic System 
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COMMON NAD 

PMt-MF^ftW2^CR4T^S46NS»CWUVAI^#AT,CPR40R 

1 ,CTEST.ALLTST,DISEAS, STAND, FILE1.FILE2, 
2 DEPT H. T i fR 8S M,Nl N IT t. MO I S E. NO D ES. R A SE , TREE , CU RL ST 

3 ,PATLST, STRUCT, SVNCNT,SVNLST,UNACTOtTSTRUN.PATSTK 
V^ O O C .Q P S TCK , S TACK, U E UMC,A R CS T PR I * t CONST r*PO**T 

5 V NPRIM,CELL 
DM* B UFU 4 32 1, iU F2 |4» 2 l,ST*CK UO^rUFUNCUOJ-, AR6S4 W4- - 

I, PRIM(30>,NPRIN<30>»C0NSTO0),CELU20) 



KA&tER MAO — 

N"R 
INSERT EILi COMMON 

D*N VAULT! 1001 

SET LIST TO VAULT 
1 > 4£T«-<T-RCS-) 

NEMVAL.<$VALUESS,LIST.(9>.LlST.tCELL(8)>) 
UUT .tO P S T CKl 



LIST.(TSTRUN) 
iiiT.ifiti iiit 

^■i ■ • ■ ■ ■ w^^p^p^f* ■ p- 

LIST.ISVMLSTI 

-L I S T .< P A T S*m 

LIST.IUNACTOI 
U S T . CT i MM 



LIST. I STRUCT I 
.-VAULT^O- 

UFUNC— 1 
— PRIMU S PL U S . - 

PRINI2I«HINUS. 

fRIMIlUTIMES. 



PRIMI41-0IVIDE. 

PR!N(6|-LE. 
— PR1M4J4-«£Q« 

PRIM(S)-GE. 
— > R 1M< 9» » 6, 



PR1M(10I»AN0. 

— **TnlHW fr#™l#pl* "-— — " ^ 

PRIMU2I-E0V. 

pill mill -mot. 

PRINU4I-ATTRIR. 
PR lM»t » H» R « » 



V*S NPRlH-15 t SPLUS*ttMINUSS«tTIMES*,SDIVIDE*, 
-V- > L t . » LE t , »EQ> . *C£ *»*fi»+ t AND t , IOR » r»EO¥*« 

2 fM0Tlt$ATTR18$,SPRES$ 

ROLONL.I TEMPI 

Ml »P RT P .I T 8M R | 



N2«POPTOP.f TEMPI 

SETU P. t Ml *-M2) 

PRINT COMMENT INANE OF LOSS STRUCTURE FILES 
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ROLONL.(TENP) 

tii tmwi-mn a tf-m j* *, __ ___ 

J 1ft*rUrtt/r*1 tcnTT 

N2*P0PT0P.(TEMP> 

SCTLOS. I RJUST » INI I ,RJUST, I N 2 ) > 



PRINT COMMENT SINFORMATION STRUCTURE ESTABLISHED.* 

fOP^ ROLONL^^TEMPJ 

CODE-PQPTOP.ITEMPJ 

W**-GODE^&.*OEFiM£* 

LST-DEFINE.(TEMP) 

O'R GOP E.E.A GE NE R St. Qft . GOD E . E»A G E Nt 

GENER8.I TEMPI 

■ TTvtrrl — VUfinc fit fRClvKIV rRUn WCHCHO** 

0*R CODE.E.SCLUSTR* 

PRINT OCTAL RESULTS LST 

E^t ■ 



MTLIST.ITENP) 

-^TKl-TQP^ 

E«M 



^Mfr- -MAO 

EXTERNAL FUNCTION ICONTRL) 



F*T P, SELECT, FANS, FANS1 

B'N LEM P TV 

INSERT FILE COMMON 

€QUI VALENCE UP, fM T tFANS,ANS» f {FANSi»ANSH 

R 

-R 

R THIS FUNCTION IS THE CONTROL ROUTINE FOR THE 

R 0IAC N OS1S . IT MA N AGES T H E MA CRO AP S E CTS O F 

R THE DIAGNOSIS. 

R 

R 

-- EAQ^BIAG. 

C0UNT*0 

H TlI S T .tC ELUl > > 



LIST. (TEMP) 

yip ct mn f_ i _na _ ^TAMn_p v. niiTPtiT_ t <iTtNn. n_Rl hk 1 

R 

fr€ET^NO^P*OCESS^THE^ INITIAL- SYMPTOMS -NH4GH-OEFINE- 

R THE PROBLEM 

R 



W*R CBIT.E.l, P*T ILINE 
W^S-TL*f*E-*H/4IHA^ -ARf-THE- 4N4T-tAL-StCHS-QF^HE PROB L EM / * * 

H*R GETSYM.(TEMP).E.O, F*N 
WMt tEMPTY*+TEHPK F'H 

MTLIST.(PATSTK) 
M TLI S T.1U N ACTD) 

MTLIST.tTSTRUN) 
HTL-TST, (SYHLS T> 

MTLIST.(TREE) 

V*^BLN*-SYtH/ ^*E1^CA5E/t/** 

R 

R P ROC E SS T H ES E SY MP TO MS TO F ORM SY M PTO M P ATT E RN S, — 

R 
LOOP -tf*R- L EHP T Y.fT£HP4 T -T*0 ^ETPAT 

SYMP-POPTOP-(TEMP) 
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TEST-BOT.(SVHP) 

TEST»ITSVAL.(SEXCLUSS,TESt> 

M I R TE S T .Ni.Qi N i H RO T .C TE t T . T S TR UNI 



SYMSAV.fSYMP.SINIT*) 

-^r « o loop 

R 



_v. 



R HERE IS WHERE THE HOST SERIOUS PATTERN IS 

R C H O S EN. DURING T HI S I TE RA T ION, T E ST S MILL I E E V A L UA TED 



R RELATIVE TO THIS PATTERN* 



GETPAT 



P-SELECT.(O) 

OUTPUT »{CP AT, Oi INIFRH1 

V»S INIFRN-S/.H/THE CURRENT PATTERN IS.../«* 
■ VIS C H HS «W/T N E HEIGH T O F TH IS P A TT ERN IS / , F 4.3.1 H . »t 
PDUHPl.tCPAT,PATLSTl 

rHITMlT ■ IfllT. i .n IM. »' __ 

DDOMPl.(CPRIOR,CURLSTI 

OUTPUT « 1 ALLPAT , 0, OTFHT I - 

V«S QTFHT-*/, H/OTHER PATTERNS* ../,•* 

QIIMRP.IAURAT) 



TL 



R 
-R-- 



R HERE CHECK THE CURRENT STATE PRIOR FOR A SUCCESSFUL 
t DfiCilO'tK OF THB **iip»cm t pATTcni* 

R 

R - LRDRO V . UU R UT ) 

IP-ADVLER.IR.F1 

IRAROR.(R) 
JTiO-DOTCST , 

0*R P.L..99 
TI P U 

E»L 

OUTPUT. (STAND, I, ANSFRN.NAME) 

V'S AHSFRH-SH/THE CURRENT PATTERN^ tS~ AT TJUBUT£0 -10- J-*— - 
1 C6,//«* 

_R . 

R CHECK FOR MORE SYMPTOMS TO EXPLAIN. 

--R. 

GOTPAT.IOI 
— WUL-CO O E.E.O t T-AO-GET-PAT 

OUTPUT. (ST AND, 0, OK VS> 
— VI S OK V S -SW/C ON H I T EN T D IAGN O SIS F OR ALL SIGHS. /*t 



SUCC 



IRALST.(TEMPI 

-FAN-*OKA- 
R 



P OT ES T 



RRD 



I CBUH T- O 



H*R CBIT.E.0.0R.CPRIOR.E.2 

-NSTATE-0 

HORO-0 
^T*0-S«K 

E»L 

P RIN T COMN E N T UHV ID E AS. Q. ' DONE I I F S A TISF I E D. t 



CONTINUE 
RtT-CA^4*ORO 
V»S C6*SC6*S 
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W'R. WORD. E.»DONE* 

outputs stand rOrUTeftW) 

V'S UTERN-*H/USER TERMINATED DIAGNOSIS OF PATTERN/* S 

_ T ' O SUCC — 

0*R WORD.E.SNOS.OR.WORD.E.SS 

*STAT€*0 

WORD-0 

-Q*-E 

NST ATE-TRANS. ( WORD, 1 ) 

W' R N ST A T E .S . O 

PRINT COMMENT $NOT RECOGNIZED* TRY AGAIN. $ 

TMM«B~ 

E'L 

-E^L 

R 

R 'S EQ OEG' I S T HE TE S T SE L E CTION ROUTI NE- 



R 

PJWlVGQMNE*T-«ET-DEPT*rTHR*S+^**-AW^EURK44G^0liH 1 RGL^ 

W'R .NOT. LENPTY.(RDLONL.( TEMPI) 

DEPTH-POP TOR«-fTEMP~» 

THRESH-POPTOP. I TEMP ) 

C O N TRL-P Q PTO P. I TE M P) 

E'L 

ST ATE -NS TATE 
^H^STANO^&t^2^ «UTP4JT-4 2,?r€RWrOEPT+i,THft*5H^ 

SEQDEC. ( TREE 1 0» STATE ) 
W ' R A LLT S T . E . O , T ' O G E TTST 

RDR-SEQRDR.{ ITSVAL. (*VALUES*fTREEH 
STATE-S€GL«.<RD« T I^ 

IP-SEQLR.IRDR*!) 

WH(-WORO^NE*0- 

W'R STATE. E.SDUMMYS 

P RE -»M OT » 

O'E 

Pne-« 

E'L 
Qi£ _ 

WORD- ITSVAL. I $PNAME$ t STATE ) 
PftC**4 



E'L 

OOTPWwfAtLTST^^TRMDrPRtr^OROTP^ 

V'S TRMD-SH/ BEST TERMINAL DECISION AT THIS POINT IS /,C3,/ 

~ X fVvf fl/ II 1 § FT Cnrtv f~Clr LUdv r >■"*■*" " ~"' 

OUTPUT. I ALLTST,0«THEAD> 

V'S TMCAD -» H/ TEST 6«ST EILOSS)/ «> 



TLOOP TEST-SEQLR.|RDR t II 

W**F»Ne»l 

ANS*SEQLR.(RDR,I> 

^NSi^OTWTEST-1 

NAME* ITSVAL. 1 *PNAME*t TEST I 

OUT P UT. (ALLTST , 3 t TU N, N AME , F A N Sl f T AN S F AN S! I — 

V'S TLIN»SC6,3S,F5.1,5S,F8.2»* 

TH^ TLOOP- 

E'L 

^WTPUT»i ALLT5T , 1 1 SCORE .NODES) 

V'S SC0RE-SI6 t H* DECISION NODES CONSIDERED.,//** 

ft ; 

R SELECT THE BEST TEST 
GETTST W'R CONTRL.G.O 
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LC»T0PTH. t TEST, STATE ) 

COUHT^COUW*tC 

T*0 CKS 



AGAIN TOPT.fTEST, STATE) 

CKS^ MAR-STAT«*NE*0 

LCOUNT-LCOUNT*LC 

4*-tR-LCOUN*_4™ COUNT 

POPBOT.fCELL(ll) 
COU MT- COUM T- 1 



MTLlST.<ITSVAL.(SVALUESS t TREE>) 
jf*a_S€E« 

E'L 
DEWi -OUIPA^^4MJ40^4^0f^*S¥At^44PNAMe^4TAWl) 

V'S DF*$C6.H/ TENTATIVE DECISION FOR THIS PATTERN./** 
TI Q SUCC __ 



0«E 

-NEWBOT^*EST*TST-RUN-) 

TTEST-ITSVAL.fSEXCLUSStTEST) 
-* tR-TT E S T .NE- 0*-NE*BOT.+TTESTV*STRON I- 

E»L 



R TEST HAS BEEN SELECTED. NOW RUN IT. 

_R_ , 

MTLIST.tTEMP) 

- W R CBl T .E.Oi »tE#T0P,-(-TESJ^ttNP4^ 

OUTPUT. ( CTEST. 1 » TFRM. I TSVAL. ( SPNAME*. TEST) ) 
G ET S Y M.I T SM P l 



W»R LEMPTY.ITEHP), T*0 AGAIN 
TKESt ^ Y WP«POPTOP^JEMP4 

SYMSAV.<SYMP,TEST) 
^ E S T- BQ T . ISV M P K 

NEMBOT.tTEST,TSTAUN> 

Ti S T- lUV AL. < t iXCl . Ut t . T iS T > 



U*R TEST.NE.O, NEMBOT. ( TEST, T5TRUN) 
. _MAJU»NOT* ^LEMPTY^WEIIP-U -^-iO- TRESl 

T'O GETPAT 
_ _V *-S-^CFftM*SH/OEPTH«/,-I4+H/- AND-TWREStU^»F**2*S- 

V*S TFRN-SH/THE TEST SELECTED IS /»C6*$ 
_EJLN 



GENERB MAD 

E XT ERNA L FU N C TIO N IX > 



N'R 
-F • T RANNO,OLOP ^p^R^ESTP- 
B*N DNARKtLENPTY 

- -I NSERT-F4WE-C0NMQN 

EQUIVALENCE (IPR.PR) 

i ' O GENE R B ■ 



R 

R~ GENERB- J S- XHB _S4*ULAI0R ^QR^T-HE^ 

R DIAGNOSTIC SYSTEM. 

LI ST. (WORK) 

Ht R .N0 T .L 6 H PTY .tX) 



POPTOP.tX) 

4I0RUN4*F4ST -A CONTRVt- 
T»0 OK TOGO 
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E»L 

"H I HI wtJnHENF H™ THc wbncRATUfv* wHfj i~5 "CONTROL * w*"* 
R'T C6, ANS 

H' R ANS«NC»*VOU* 

CBIT-1 

PBfMT CDMMFWT tlOT Vflll Ut^M t H IITHttV Tfl BF IfFPT fl * 

R'T C6, ANS 

-WHl-AN$*E-.-*Y£$* 

T*0 GETFIL 

»*€ : 

FILE1-0 
FH.E2~fl 

T»0 6ETINC 
^ H __ 

E'L 

R 



R GENERATOR CONTROL HERE. SET CONTROLS. 



CBIT'O 
PR4MT- COMMENT- » H OW H ANY - CASES -W^TH«~RUN»0»* 

NORUNS*P0PT0P. ( RDLONL. I WORK I ) 

PRINT COMM ENT t N A ME OF ON E DI SEAS E OR C. R. FO R RANP O N P R A M * 

RSTAT R'T C6, ANS 

11^ |f ANS 4rC * ww " ~ ■ — " — ■ -' — " 

DHARK-OB 

0*E 

DMARK-18 
DISE A S ' TH AN S . U N S f 1> 



H*R DISEAS.E.O 
T«0 RSTAT 

-E^L : 

IPR«ITSVAL.C*PROB», DISEAS) 

— **t 

PRINT CONNENT APLEASE SPECIFY UN THE OROER GIVEN) THE! 
- -PRINT- GOH ME N T *F0 bL0W*MC-GQNTROL -P ARAMETERS-FOR ^FHE-RUNA- 
PRINT COMMENT *L. DEPTH OF THE TREE SEARCH. $ 

-PR4NT-G0NN€MT-*^-§R*A©TM^IMm*G-P*0*A«*L-ITY,r* 

PRINT COMMENT S3. NO. OF INITIAL SIGNS PER CASE-* 
PR I N T C OMME N T M . NO. OF N OI S E SI GNS P ER CA M. A 



PRINT CONNENT *5. HEURISTIC CONTROL FOR TEST SELECTION.* 

--R- 

R 

-OEPTH»PQPTOP^IROL O N L . < W ORK I ) 

THRESH-POPTOP. ( WORK) 

N I N ITS « P0PT0 P» I W 0R*I 



NOISE- POP TOP. ( WORK > 

wWf I RL^rUrlUTf 1 1f URn w~^ "™ 

R 



GETFIL PRINT COMMENT SNAME HISTORY FILE.* 
F ILEl " R>IUST . <POPTOP.<*PLO N L . tWORKm 



F I LE2-RJUST. I POPTOP. ( WORK ) I 
- -ASS4GN T fFU.€4 T BUF4rr BUFW - 
T*0 GETINF 



R 
GET INC P ' T CDF 



V«S COF-AH/FOR EACH OF THE FOLLOWING, TYPE •!• FOR A/,/, 

l++/£GN56LE TRACE, *0« 8THERWTSE,/** 

T'O RDINF 
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GET INF PRINT COMMENT $FOR EACH OF THE FOLLOWING, RESPONO* 
PK4NT-C0MMCNT-S^t*— IF-V{HJ-tM«N-A--C4]NS04C-T4tAC£r* 

PRINT COMMENT »*2* IF YOU MI SH A HISTORY RECORD,* 

P R I N T C O MM E N T > 'J ' I F Y O U MUM RO T H * AMO 'ft ' IF NE IT H ER . * 

RDINF PRINT COMMENT SI. CURRENT DISTRIBUTIONS 
PRINT- COMMENT-M- CURRENT- PATTERNS 

PRINT COMMENT S3. PATTERN STACK* 
pft4*T_ COMMENT— S4~- -TESTS- AN* -VALUES* 

PRINT COMMENT S5. TEST SELECTED* 

P RIN T C O MM ENT * fc. S IG NS O F T HE P ROBL E M * 



PRINT COMMENT *7. STANDARD INFORMATION* 
^ PRTQR- P Q PTOP . IR DLQNL .-MiORKJ-)- - 
CPAT*POPTOP.CMORK) 

ALLPAT-POPTOP* WiORK) 

ALLTST«POPTOP.tMORK) 
C T E ST-POPT Q P .IMQRK) 



SIGNS-POPTOP.fUORK) 

STAND- POPTQP»1MOMO 

OKTQGO PRINT COMMENT »* 

1 ALLTST.CTEST, SIGNS, STAND) 
. M t R CBl T .i.l „ . — _ 



T*0 DODIAG 

0« R DWARK 

T»0 START 

—EJ-L 

R 

R SE T U P DISEASE SE LE CTI ON L IS T . 



R 

_p«0* — 

LIST.(GENLST) 

RDR^SEQRDR x {TOP* f STRUC T ) ) 

SLOOP HSHLST«SEQLR.(RDR. I) 
Mt R I.MS.I 



R-SEQRDR.(HSHLST) 
HLOOP NEXT *S EQLR.tR,F) 

M*R F.E.I, T*0 SLOOP 
IPR«|TSVAL« < >PROB*tNEXT) — 

P-P+PR 
MANY. CCEMl ST, NEXT, P) 

T*0 HLOOP 
6 1 L 

R 

T*H RIN, FOR J-1.1.J.G.20 
-R4N R ANN O . IX ) 



R 

_ r_. CONTROL- LOOP- FOR ^THE GENERATOR. 

R 

START T^H~6END» ^©R- 4*W4rJ-G»NGRUNS- 

OLOP-0. 
P'T CO M, J 



V*S COM**H/CASE /,I2»* 

MtR^DMARK^T-*-0-60T-IT- 

TESTP-RANNO.(X) 

RDR-SE4R0R,4G£NLST4 

GLOOP DISEAS-SEQLR.(RDR,I) 
M » R I .S.I 



OUTPUT. I STAND, 0, BUG) 

V«S BUG**H/BUG IN GENLST/**- 

CHNCOM.tO) 
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0«E 

TPR-SEGLR*+RORr+> 

W'R OLDP.LE.TESTP.AND.TESTP.L.PR, T'O GOTIT 
OLO P-P R — — 



T'O GLOOP 

E*-L -- ■ — - 

R 

GOTIT 0UTPUT.lSTAND,3,HEADl,J,ITSVAL.<SPNAMEi,DISEAS>,PR-0LDP) 

CURLST-0 

DODIAG DIAG.(CONTRL) 

WM^CBlT^Ir — — 

PRINT COMMENT $ANOTHER-Q.$ 
g^f-Cftf-ANS 

W'R ANS.E.tYESit T'O DODIAG 
T ' O FI N 1 — 

E»L 

GENO CGNTWUE 

OUTPUT.! STAND, O.TFRM) 

-IttAtST^GENtSTf 

V«S TFRM-$//,H/RUN COMPLETED. /*$ 

f+M4 FILE . tFILEl) — 

IRALST.(WORK) 

FJ-N -- ■- " 

R 

V»S I1=$I1*$ 
V'S C6-SC 6« t 



V'S HEAD»*H/SWITCHES FOR THIS RUN/,/, 

!^TH€PRH)R»Tll,lS f 5«GPAT- r IlrlSi?HAU.PAT=, 

2 II, l$,7HALLTST*,Il,lS,6HCTEST=,Il,lS f 6HSIGNS=, 

-3 *i,iSr6HSTANB~f|-tT/*$ 

V«S HE ADi=$//,H/»» •••••••*•••*•••»•••••••••••••••••/ , 

1 // ,H /CASC / , I3 iH /. OISCASC IS /,C 6, 2H ( . 

2 F3.2,2H)./*$ 

£jLN _ 



GETSYM MAD 

EX TEftNAL f UNCT40N < LST f 

N»R 
+iT TESTpTPftrPet^rPNtW^T^rRANNQ 

INSERT FILE COMMON 

B'N LC MP TY f N A M T5T i SPTCST 



R 

RrTWS FUNCTION HANDLES ALL- THE SIG* *ETRKrVAL^ 

R ACTIVITY FOR THE DISEASE GENERATOR AND THE 

R DIAGNOSTIC PROGRAM. 

R 
C«Q GETSYM* 



RET*1 

L I ST. 1 WORK) 

W'R CBIT.E.l 

W'R SIGNS^Wl 

DS»2 
-Q^E 



DS=»0 
^^L 
R 
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R HERE THE USER IS IN CONTROL. SIMPLY RETRIEVE 

-R- THE ^lEXT „S**p-TO* -FROM- HI* ^^WT TH-TRANSLA T4£N- - 

R AND CHECKING). RECORD SYMPTOM AS CALLED FOR. 



OUTPUT. (DS,0, FIRST) 

ROLONL*4*QRK-> 

LOOP W*R .NOT. LEMPTY.(WORK) 

4*AH§*POPK)P»4WOWU 

W'R NAME.E. SNOTS. OR. NAME. E.$NO$ 
NL-POPTOP.IHORK) 



W«R NAMTST.(NL) 

-$ TRANS -+a) 

0»E 

^^SEOROft. tNL^ 

SL NL*SEQLR.(ft,F) 
W ' R F . N E. l 



STRANS.fO) 

T^Q-SL 

E»L 

^14 

INTERNAL FUNCTION (DM) 
E'Q S T R AN S. 



WORD»TRANS.(NL,2) 
^*^R-V*ORD.E»^r- I-«4)-EftRIMK~ 

NEWTOP.(-WORDtLST) 
-OUTPUT^fOS^+rNRMrNW) 

F»N 

E« N 



O'R NAME.E.SNORMALS 

R' SE Q RDR . IP OP TOP. t WORK) 1 

TL NEXT*SEQLR.(R*F) 

W 'R F. NE^l 

OUTPUT. (OS, 1, NT, NEXT) 
V'S NT»S H /N ORMA L / .C6 »t 



Rl*SEQRDR.(ITSVAL.(SNEMBERS,TRANS.(NEXT,3H) 
TL4, SYNP-S£QL*^R-U*4J 

W«R Fl.E.lt T»0 TL 
NE-WTOP.4-SYMP,LSTJ 

T«0 TL1 
tU. _ 



0»E 
HORO» TR A NS . ( MAM£^2T 

W«R WORD.E.O, T*0 ERRMRK 
NEWTOP-^TMOROriST^ 

OUTPUT. I DS,1,P0S, NAME) 

-6-Lfc 



T»0 LOOP 

E«L 

R 
-R-WHSK THE CURRENT 4, IST--ES^EHPTY T INI TI A L S Y MPTOMS 

R MUST BE GENERATED. 

-R 



0»R CURLST.E.O 

-OUTPUT *4S4GNS+0* UiTERM-)- 
MANY.( LIST. ( TEMP ),DISEAS, 1.0) 

^QUNT~REtTST.TWORK^TENP-l- 

IRALST.(TEMP) 
-R 



R HERE THE INITIAL TESTS ARE CHOSEN AT RANDOM 

* IQ-QBTATN^ THE- 1JUTIAL, -SYMPTOMS* 

R 
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SMITCH-O 

^T*N-TGfcOOPi -FOR" *-*rlTrKG» NMiHS- - 
W*R COUNT. E.O, T*0 OUT 
— KT l fGQU N T i RAN N O . IXm 



K»0 

ana^c cntmii § unnw t 

GETl TEST»SEQLR.1R0R V I) 

-*•«+! 

M*R K.L.KTH, T*0 GETl 
COU N T « COU M T ■ i 



NEWTOP.tTEST*LST) 
SYM&EAhrttSf) — 

REMOVE. ! LPNTR. 1 ROR ) I 

-WHl-FO^-a-ST-l-H^^-SMK^H-i-- 

TGLOOP CONTINUE 

©** M ' R S W lTG H . E .O 



OUTPUT. < STAND, 0,N0S> 

RCT-G 

E*L 



R MERE A RESPONSE TO A PARTICULAR TEST IS 
R R EQU IR E D . THE . TEST I S O N TH E TOP O P 'L S T' - 



-0*E 

SYMGEN.(LST) 

-E*t 



MC« IRALST . IMORKI 



F»N RET 

CnRnHK «U I PU I •! 5 1 ANDt U v cKn 1 

T*0 LOOP 



R THIS FUNCTION SELECTS A RESPONSE AT RANDOM 
R TO T H E TEST O N TH E TOP OF T HE 'L S T ' G I ¥EN — 



R THE KNOWN DISEASE *DISEAS*. 



INTERNAL FUNCTION (X) 

— E^O-SYMGEN, 

TEST«POPT0P.aST! 

W ' R |T S V A t . t »&P T Efc T t, T *S T ).i.t Y iS» 



SPTEST-18 
-0»E 

SPTEST-OB 
_E*t 

TESTP-RANNO.fX) 
P OLO«0« 



R-SEQRDR. ( ITSVAL. I SNEMBERS.TEST) 1 
GtOO* NtXT»SE<H.ftn»(ft f S* 

N*R S.E.I 

-t»Ht S P T E ST, -F*N 

GLOOP1 N£XT»SEQLR.(R,S> 
W 'R S. E. I , F 'N 



NAME- 1 TSVAL. < SPNAMES»NEXT ) 
__ ni it pi it t «r i fty< , i ****** mi mf \ 

NEHTOP.l -NEXT, LST) 

^F«e GtQOPl 

0*E 
LOG- M E MB ER, CDI SE A S, 1T S ¥AL » I tHEHBE R t .NE XT) . * 

W*R LOC.E.Ot T*0 6L00P 



H»R*P*d ^N€*T,CONT»fLNKfU fCONT-^i LO&H «*H-- 

PNEW-POLD+PR 
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W*R POLD.L.TESTP.AND.TESTP.L.PNEU 

MJHC- I T<WAI I tPMiMfft .HFYT I 

OUTPUT. (SIGNSfl.POS. NAME) 
H6H T QP. (N tX T t L S T) 



W«R SPTEST 
P4)U)«0»__ 

T«D GLOOP 
^M. 

F»N 
-Oi* 



POLD-PNEW 
— I*O^W00P^- 
E«L 



-E^L- 

E*N 

-R 



R 

V'^ E4aS T »» H/ 4tS6«.*ESPWSE /^C6*» 

V*S NOS-SH/INITIAL SIGNS ARE ALL 'NORMAL* SIGNS./** 

__ViS„iHR*»H/^lC|^ MOT -RECOGNIZED 1GM0RE0»7M 

V«S INIFRH»*H/THE INITIAL SIGNS OF THE PROBLEM ARE/** 
WIS MRM»»H/QBSERV6D SIGN 'NOT /,C6.2H'.*t 



V*S POS»SH/OBSERVED SIGN •/•C6 t 2H'.*S 
--EJLM-- 



POUNP MAD 

N'R 
Fa_WT 

EQUIVALENCE UWGT.WGT) 
I NSER T PIL E COMMON 



E*0 OUMPP. 
WUUALLP-A T . E .O^^m 

OUTPUT. I ALLPAT.O, BLANK) 
R*SEORJ»UUUTSTK4- 

C0UNT»0 
-LOOR N£XT«SEQLR.(R»F) 



M'R F.E.I 

WMt C O UNT -E*W -OUT*UT*4^LLPJlT-^(M)NMr^ - 

OUTPUT. tALLPAT* Of BLANK) 



E«L 
C 0UNT*C QUN T »1 



W ( R NEXT.L.O* T'O LOOP 

M' R N E X T.E^CUftLST.ANO»CFAT.E.AlLFAT,T'0 LOO* - 

IWGT>ITSVAL.(*WEIGHT$,NEXT) 

^0UN*M.^LL*AT,-IT-S¥AL,44SYM*»S* f NEXWJ 

OUTPUT.(ALLPATil,CLINEiWGT) 
PD UHP 1 -(A LL PA T ,NEX T) 



T«0 LOOP 

R 

R 

y»5 BLANK* ♦/*> 

V*S ONLY'SH/CURRENT PATTERN IS THE ONLY ONE./»S 
V' S C LI NE» tH/ P A TT ER N H 6IG HT « / , F ».3 «» 



E»N 
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DUMPl HAD 
EXTER N AL fUNCTIO N ( H AnK t LST) 



N'R 

fiT PtPTQT 

EQUIVALENCE (IP, PI 

INSERT FILE COMMON 

E'O PDUMP1. 

W ' R M ARK.E.O , F' N 



CNT = 

R=»SE3RBRni{H-ST> 
LOOP SYHP=SEQLR.(R,F) 

W'R F-*E»i 

W'R CNT.G.O, OUTPUT. (MARK, CNT , SYL IN,*ARRAYS ) 

OUTPUT. ( M ARK , , B LANK) 

F'N 
E'L 

W'R SYMP.L.O 
CNT-CNT+1 
STACK(CNT)=$NOT $ 

EJ-t 

CNT=CNT*1 

STACK{£tm-ITSVAt.r(*PNAMES,SYMP).V.SGOO,00$- 
W'R CNT.G.17 

OUTPUT.<NARK-v£NTrSYLIN,SARRAY*) 

CNT = 
E_H- 



T'O LOOP 
-R 
R 
E'O DDUMP1. 
W'R MARK.E.O, F'N 
PTOT - 0. 



R=S£QRDR.(LST) 
OUTPUT .( HARK, G , OL I NE ) 
LOOP1 STATE»SEQLR.(R,F) 
W^* F.E.I 

OUTPUT. ( HARK, 2, LINE, TRACE, 1. -PTOT) 

OUT P UT . ( HA RK , 0, BLA N K) 

F'N 
E'L 

IP=SEQLR.(R,F) 
W'R P.L.l.E-2, T'O LOOP1 
OUTPUT. (MARK, 2, LINE, I TSVAL. ( SPNAME*, STATE) , IP ) 

P TOT -P TOT iP 

T'O L00P1 
R 
R 
VS BLANK«$/** 
V'S TRACE=*TRAC£$ 
V'S SYLI N- tl 8 C 4»t 



V'S DLINE=$H/C0NDITI0NAL PRIOR STATE PROS/,/** 
V'S LlNE=S20S,C6,F4-2*» 

E'N 



OUTPUT MAD 

EXTERNAL FUNCTION ( MARK, NARGS, FMT, Al , A2, A3 ) 



204 



N'R 

4 NS€RT^4t€ -COMMON 

E'O OUTPUT. 

H' R A l . E .SA RR A V t, T'Q D O( H ARK) 



STACK! 1)«A1 

STACK(3)-A3 

T^O-DOmARKJ- 

00(3) CONTINUE 
0O424 W P R N ARGS. E . O 



DWRITE.(FILEL,FMTI 

^qle 

0WRITE.(FILE1,FHT, STACK (II... STACK (NARGS)) 

HO D0(MARK-2) 
0O44-) W'R MARCS. S.O 



P«T FMT 

fli£ 

P«T FMT, STACK ( i)... STACK (NARGS) 

00(0) F*N 

£114 



SELECT HAD 

EX TE RNA L FU N C TI ON ( DUMM Y) 



^*^WaS^F4M*CTUIN-EXAJt»(eS-ALL-4WE-PATT6RNS^lW^THC 

R PATTERN STACK. IT RETURNS THE NAME OF THE PATTERN 

R WH I C H -HAS^MnHMUM-feXP£CT€P-LOSS^AS-M>AT4.SJ^. 

R THE DISTRIBUTION CORRESPONDING TO THIS PATTERN BECOMES 
R ' C U RLS T' , TH E C URRE NT DISTRI B UTION . 



R 

_Nm 

INSERT FILE COMMON 
- -F ^T— WE 1 GHT , WGT ,*SA¥E^ 
E»0 SELECT. 
PSAVE»0. 



FIRMUP.(PATSTK) 
ROR-SEQR«UiPATSTK-> 

SAVLST»CURLST 
TLOOP N£XJ«SEQLR.tRDR,I> 

W*R I.E.I 
R — 



R UPDATE THE TREE 

MTLIST.(TREE) 
NE*VAL^(*¥ALUeS4rLl$T^m T T*EE^ 

NEWVAL.ISPRIORS, CURLST, TREE) 
WiR SAVL S T . M E . CUR L S T . M TLI S T . ( CEL L! 1 ) ) 



F»N PSAVE 
_€±t 

HGT»MEIGHT. (NEXT) 
NEWVAL.(lW€IGWr*,WGT T NEXT) 

W»R NEXT.L.O, T»0 TLOOP 
W * R H CT .G. P S AV E 



PSAVE-WGT 

__ -CURLST^NEXT 

PATLST*ITSVAL. [ $SYMPS$, NEXT ) 
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£«L 

-r^o^rtoop- 



R THIS FUNCTION UPDATES THE UNACCOUNTED- 
-R- FOR -LIST -AFT€R- A- SUCCESSFUL— DTAGNQS*S-QF--A^»A TTER Nw 
R IT CONTROLS THE FORMATION OF NEW PATTERNS FROM THE 

irsTnrTWW itcnRinmu un rrrc wtta^tv liji* 
R 
E 'O G OTP A T . 



CODE-1 

a f\n» t'g rtft nft jj. tujf Tn i 

LOOP SYMP*SEQLR.(RDR t I> 

CHEOK W^fl-KE^tf-T-^O -PRUNE 

LOC«MEMBER.(SVMP v PATLST,0) 
W'R LOG.E.O 



W*R SYHP.G.O, CO0E*0 

— ^r^e-Loop 

0«E 

-ABD-L^NmrtRORJ 

SYHP-SEQLR.(RDR.I) 
RE M OVE . I APPI 



T«0 CHECK 



R 



PRUNE U*R COOE.E.lt F*N 
RDR - SE Q RPR.I P ATSTKI 



LIST.(TEMP) 

tUbri N wv7 H* 9CwtH rHi I/Ht*"*! "™ ■ 

H ( R l.E.l 
f lq ReSTGR 

0«R NULST.L.O 
N EM B OT. INULST t T EM P ) 

0*R NULST.E.CURLST 
NEWBOT. < -N ULST rTEMFI 

E«L 

T^O-bOOPi 

R 



RESTOR RDR*SEQRDR.(UNACTD) 

MTL-IST T H»ATSTK+ 

L00P2 SYHP»SEQLR.(RDR,I) 

y± R--|-m-E«-1 

INLSTR.(TEMP t PATSTK) 
IRAL S T«tTC MP ) 



F«N 

-E*t- 

W»R SYMP.G.O, PATFRM.(SYMP) 

T»Oi«K 

E»N 



P ATFRM^ -MAB — - — - — _-_.-,_ 

EXTERNAL FUNCTION tSYMP) 

ft 



R THIS FUNCTION FORMS ALL THE DISTINCT PATTERNS 

-R- FBR-A-GTVEN SYMPTOM, *$YMPL.^T PROCESSES 

R ALL PATTERNS SO FORMED AGAINST THE CURRENT 
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R PATTERN STACK. IF THE PATTERN IS A NEW ONE, IT 
R-TS^R€TA4N€B,^3THeRHlSE-TT-I^^i-SGAR0E^^ 

R 

H±* 

B»N SUBSET 

INSE*T-FTtE-COMMQN 

F'T P, UPOl 

E^O^PATE-R-Mw — 

HEMIST* ITSVAL . < $MEMBER$, SYMP ) 

_R — 

R PROCESS THE SYMPTOM PATTERN FOR EACH STATE ON THE 

R- MEMBER- ttST- «F *5¥NP^. 

R 

RBR*SEQRm.4*EHL£T4 

LOOP STATE*SEQLR.<ROR,I) 

W' R I.E . I, F'N 

R 

R^ CHECK ^OR-THTS- STATE ^N THE CURRENT -PATTEKW STAC (U- 

R IF IT IS THERE THEN ITS SYMPTOM PATTERN MUST ALSO 

R &E THERE ,-ANO -IT- -SHOULD B£ -4 GNQftfeO. 

R 

S EQL R. IRO R. I ) — 

W«R MEMBER. (STATE, PATSTK, 1 ) .NE.O, T»0 LOOP 

R STATE NOT FOUNO IN PATTERN STACK. SYMPTOM PATTERN 
R^FGR THTS -STATE- *A¥-8E^^NE* PATTERN^ 

R GET THE 'PARTIAL SYMPTOM PATTERN* FOR THIS 
R S TATE G IVEN T H E CURRE N T SY M PTOM LI S T . 



R 

INSECTrI-WW*ACT^r5TATE,LT^T,TTEMP>) 

R IS THIS PARTIAL PATTERN A SUBSET OF AN EXISTING PATTERN. Q. 

R--SEQRDR* I PATSTK) 

CLOOP NEXT=SEQLR.(R,F) 

W R F . NE . I — 

W»R SUBSET. ( TEMP, ITSVAL- ( $SYHPS», NEXT)) 

IRA4ST»TTEMP) 

T'O LOOP 

-Q*-E ^ 

T»0 CLOOP 
EJLL 



E*L 

R 'TEMP* NOW CONTAINS THE PARTIAL SYMPTOM 

R PATTEAN* -CftEATE-THE STATE PRTOR^FOR THIS- PATTERN 

R ANO AOD IT TO THE PATTERN STACK. 

R 

NULST=C0NT.(NEWB0T.<LIST.<9),PATSTK)+1) 

HE¥VAL.J4SYMPS*,TEMP*NUtST) 

IRDR-SEQRDR.(MEMLST) 

INLOOP 5TATE4»SEQUU+t*DR,IIi 

W'R II.E.l, T»0 PROC 

SE Q LR . IIRD R, II) 

W»R MEMBER. (STATE1, PATSTK, I). NE.O, T»0 INLOOP 
MANY. * NU4 ST *STAXET, ITSVAL. **PROBS, ST ATE 1)1 
T'O INLOOP 

R »NULST» NOW CONTAINS THE STATES AND A PRIORI 

R P R OBA B I LIT IES F OR TH E PATT E RN I N 'TEMP' . 

R UPDATE THIS PRIOR BASED ON THE SYMPTOMS IN -TEMP*. 
R 
PROC P=0. 
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PRDR-SEQRDR.UEMP) 
PteOP S¥MPl*SEI}tftH^ROR>*>H 

W'R PI.NE.l 
UPD1. (SYHP1 ,N ULST |N UL S T) 



T'O PLOOP 
-EJ-L 

IRALST.I TEMPI 
^TMH^OBP 

E'N 



upd — *ad -- 

external function (symp) 

R_ 



R THIS FUNCTION SUPERVISES THE UPDATING 
-R^OF-THE^AT*eRN SMC*-WV€* ^HE-NEW- *-SY*P A » 
R EACH OF THE STATE PRIOR LISTS IN THE STACK 

R IS UP0ATEO-(-PRG*IDEO THAT THE^SVMP^^S 

R RELEVANT TO SOME STATE IN THE LIST). 

R WHE N EVER T H E PROB AB ILITY O F A P A TT E R N G O E S 



R TO ZERO, THE PATTERN IS DELETED FROM THE 

Rr^ATTERN STACK-.- 

R 

N«* 

B'N LEMPTY, RELEV 

I N SERT F IL E CO MM ON 

F'T P.UPDl 

--EMJ-UPD. 

NEWBOT.ISYMP.UNACTD) 

nnn*CFnonp_ f PATCTtf 1 

_.. . _ — TMJTV'.^C v'vnl TTwwTn T 

LOOP STLST*.ABS-SEQLR.(RDR,II 

C H ECK W 'R I .E. I , T ' O F I N IS H 

R 



R GHEGX-THE- RELEVANCE OF THE SYMPTOM JO 

R THE PRIOR IN STLST. 

W'R .NOT. RELEV. (SYMP, STLST) 
W* R S Y M P . L . O , NE WB OT . ( S Y M P , IT S V A L . U S YMP S * . ST L ST )) 

T'O LOOP 

■--■ E'L 

R 

-Rr^FHE SYMPTOM IS RELEVANT^ USE IT TD - 

R UPDATE THE PRIOR. 
ft 

P=UP01.JSYMP,STLST,STLST) 

W l «-hE«0 

W'R STLST. L.O, UNDO. (STLST) 

ADO*LPNTR.<RO*> 

STLST«SEQLR.(RDR,I) 

SY MP S - ITSVAL.(»SY HP S $i RC H QVe.lADDn 

PLOOP W'R LEMPTY. (SYMPS) t T'O CHECK 

TSYMP»POPTOP.tS¥MPS* 

W'R TSYMP.G.O* PATFRM.(TSYMP) 

T'O PLOOF 

O'E 

N EW VAL . t t PRO B*. P , STLST ) 

NEWBOT.i SYMP, I TSVAL.t*SYMPS», STLST)) 

T'O LOOP 

E'L 
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R 
-R- H6W6^IW-S¥WiT0H--tS~ P R O C E SSED- t*-t?AWWi-- 
R TO SEE IF ANY NEW PATTERNS CAN BE FORKED. 

-R . 



FINISH W*R SYMP.G.O, PATFRM.f SYNP1 

E*N 



_0J>^1^_ 



EXTERNAL FUNCTION (SYMP,LST1,LST2) 

-R 

R THIS FUNCTION UPDATES THE STATE PRIOR 
R I N ' L STU TO A CC OU NT FOR THE M EM 



R SYNPTON »SYMP». THE SIGN OF «SYMP» DENOTES 

-R- THE -PRES EN CE OR AB S E N CE -OF- OS***!-. 

R *LST2* IS WHERE THE UPDATED PRIOR IS STORED. 



N»R 

V'S E PS I-1.E » 4 



INSERT FILE COMMON 
~^iT-P*-P-W»€*$4,P-R,PR0* — 
EQUIVALENCE (IPROB.PROB) 
-BiK SAME 



E*0 UPD1. 

N« R CS T 1.S.LS T 3 



SAME- IB 

.QiE 

SAME- OB 



P-O. 

MS Mi S T- I TSV AL. I * M EM BE R» . S YMPI 



R 
R_ PROCESS- EACH- STATE -ON -THE-N€MB£R-U-ST-OF-iSYMP-i 

R 

R0RmSE0RDR*4LST4J- 

LOOP STATE*SEOLR.|RDR,I» 

CH E C K M'R l.E.l . 

W»R P.L.EPSI« F»N 0. 

RDA-LRDRQY.+LST2) 

AGAIN IPROB-ADVLER.IRORt I) 

NlR_UE*4^F-«Ji-P 

ADD-LPNTR.fRDR) 
S U t S T .I P RQB /P .ABO ) — 



T'O AGAIN 

^x± 

IPROB-SEQLR.IRDR,!) 
_LQOMEMB€R,4£TATEi*EMLS?rD^ 

W»R L0C.E.0 
Pft-O, _ 



0«E 

^R»P^J^SYJIP^4UJNT.44NKR*!C0NT-»14J)CJ4*4JJ 

E»L 

-WMt-SYMP-L,©- 

PROB-PROBM1.-PR1 
01 i 



P ROB- PROS* PR 

^a L 

R 
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R CHECK FOR 'ZERO* POSTERIOR FOR THIS 

a C TATr i f t foh nri ftf it Farm thc i ht. - 

R 
W'R P RO B .t.C P SI , T'O S CRAP 



P-P+PR08 
_ y* n T iHF 

ADD«LPNTR.<RDR> 

' " 4WV4 ■ » H*Rw8" V APvt ~~ 

O'E 

MAN V . tLSTa . STATCPRO B * 



E'L 
T 'O LO OP 

R 

n~ ncnc »3 ■ncfiiz w 7th i c r9 ntnu vc-if t nun ^t^r^ 

R 
«6ftAP W'R .M OT. SA ME , T ' O LOOP 



ADO'LPNTR.tRDR) 
- -AWH-t NKL-«-<€ONT-. fA0B-» 4— 

STATE-SEQLR.(RDR,I) 
— REfleVE^-MBOf 

REMOVE. tAOOl) 

T'O C H ECK 



E'N 



-PW MAS- 



EXTERNAL FUNCTION (SYMP t CLUSTRI 

__Ulj( . 

INSERT FILE COMMON 

D PI EC nrTTf FHlnrj"! ' ~" " ' "" "" 

F'T Pl,P2 

C 8 UIVALE N C6 IPl i I.PI I f IPZ f 1P2) 



E*0 PIJ. 

-R 

R THIS FUNCTION OBTAINS THE PROBABILITY OF SYMPTOM 

R- J-S¥NP^~*€tOST*A ^t «THE*^»M*-PRQBA8R*TY-eR- 

R THE NAME OF A CLUSTER WHICH CONTAINS 'SVNP'. 

-R 



W'R NAMTST.ICLUSTRlt F*N CLUSTR 

™ ^t ft 9* «TTCnr| "~ ■ " "™ - ~ 

LlST.tOPSTCK) 

LOOP NEXT-SEQLR.IROR,!) 

W . R UCil 



R 
R 

F * M -■- 1 » 

0*R NEXT.E.LPAREN 
T'O LOOP 



O'R NEXT.E.RPAREN 
R END OF A TRIPLE. PROCESS OPERATOR AGAINST 'TEMP* 



IPlspQPTOP.ITEMP) 
F IRST» P OPTOP .I T EM P) 



W'R LEMPTY.(OPSTCK) 

R 

R END OF THE EVALUATION 
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IRALST.(TE*P-> 

IRALST.IOPSTCK) 
P'N PI 



E'L 

JP2*P^M0P»(TEMP) 
SECOND-POPTOP. ( TEMP ) 
^PER*PQPTOP.{QPST€tU 

R 

R PRO C E SS OPE R ATOR HE R E — 



R 

*M* OPE-R-E T $aR4- 

N'R FIRST. E.l. OR. SECOND. E.l 

Pl*Pl+P2^ 

BMARK=1 
qjle 



P1»0. 

RHAftK^O 

E'L 
Q«E 

BMARK^FIRST+SECOND 
W'ft BMARK.G.l 



Pl-0. 
_&*A**=3 
O'E 

Pl=Pl+P2 

E'L 
-i-M= 



R 

ft C+iECX EOR AN OPERATOR HERE 
R 
O'R NEXT.E.*Q*t.OR.NEXT,E.iEXQ*$. 
NEWTOP. (NEXT.OPSTCK) 
T' Q LOO P 



R 

R PROCESS SU&C4USTER WERE 
R 
O'E 

BMARK-INTERP.tNEXTJ 
W'R SYMCNT.E.Q 



P1 = 0. 
BMARK=0 
O'R BMARK.E.O 

O'R MEMBER- (SYMP, NEXT, 01. NE.O 



IP1*ITSVAL.<SPRGB$,NEXT» 
O'E 

Pl=l. 
E'L 
O'E 
Pl^O . 



E'L 
E'L 

MANY. (TEMP, PI, BHARK) 
T ' LOOP 
V*S LPAREN'SIS 
V' S R P AR EN =* 1 » 



E'N 
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NSCOMP MAD 



LOOP 



CXTCR N AL TUNCTIO N <TCST t P RIORt L S T I 

N'R 

* j t isupoi 

e'o nscomp. 

Tt-S£QRDR* t I TSVAL « { *MEMBER$i-T£S-Tl 1 
SYMP=SEQLR.(R,F) 
W'R F.HE .l 



P=UPDl.t-SYMP, PRIOR, LST) 
#Ht P^G* &--r -T H> LOOP^ 

E'L 

FH4-P 

E'N 



DEFINE *A£K 

EXTERNAL FUNCTION (TEMP) 
N-M* 



B'N LEMPTY, OPER 
INSERT FTL£ COMMON^ 
E'O DEFINE. 

LST-0 

LIST.(RDRSTK) 
NA M E - POPTQP.tT E MP) 



NUM=rUFUNC(0) 

T *+f LOOK , FOR J= 1 , 2 , J -G^NUM - 

W'R UFUNCUKNE.NAME, T'O LOOK 

u*ft UFUNC ( j-t- 1 ) . NE . 

PRINT COMMENT SRELATION ALREADY DEFINED. REPLACE. Q.* 

ft'T tC 6» t , ANS - 

W'R ANS.NE-$YES$, T'O DONE 

LST=LIST.(UFUNCtJ+lH 

T'O START 

£«L --■ 

CONTINUE 
- N U H = N U M4- 2 



LOOK 



START 



UFUNC(NUM)=NAM£ 

L5T=LI ST. «UFUN£I*UM+1 ) ) 

PRINT OCTAL RESULTS LST 

UFUNCIO)=NUM 

PCOUNT-0 



O P CR - 1 B 

ARGLST=POPTOP. ( TEMP) 
*eR»SEQRDR.(TEMP) 
LOOP ELEM=SEQLR.(RDR,I) 

W'R I.E.I 

PCOUNT=PCOUNT-l 
W ' R LC MP TY«(RDRSTK) 



W'R PCOUNT.G.O 

PRINT COMMfcNT 4N0T WELL FORMED. TRY AGATN-* 

UFUNCU + l»*0 
E'L 
T'O DONE 

-&** 

NEWBOT.(t)$, LST) 
ROR=POPTOP. IRORSTK ) 
T'O LOOP 
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E'L 

W'R OPER 
0PE R *0B 



W'R ELEM.E.tQUDTEt 

E4.£*^SEQUU<RDR,-I^ 

T'O COEF 

E-«L 

CODE-ARGTST.iOt 

W' R CODE. M E. 0, 6L EH ~CQP E . 



NEWBOT.(ELEM,LST) 

0»E 

CODE=ARGTST.(01 
W'R CODE.NE»0 
ELEH=C0DE.V-54K10 
NEWBDT.{ELEM,ISTI 



CQEF 



O'E 

__ CNUM=CONST+aJ 

CNUM*CNUM+1 
CONST I tWJ* l=ZLE M- 
CONST(0)»CNUM 

NEWBOT. <CNUH,LSn 



E'L 



O'E 



__OPl€R^1B 

NEWTOP.(RDR,RDRSTK) 
RDR=SE Q R O R . (ELEH) 



NEWBQT.{$U,LST) 
PCGUNT=PCOUNT+l 
E'L 
- -3 lg- WO OP 



INTERNAL FUNCTION (DUMMY) 



A L UU P 



E'O ARGTST. 
ARDR=SEQRDR.(ARGLST) 
ACOUNT=l 
A T E MP *SEQLR. ( ARD R , A I ) 



W'R AI.E.l, F'N 

W'R ATEHP-E.ELEM, F'N ACOUNT 

ACQUNT-ACOUNT+1 

T'D ALQQP 

E'N 



DONE 



IRALST.IRDRSTKJ 

F'N LST 
E'N 



CLUSTR MAD 

EXTERNAL FUNCTION (LST> 
N'R 
BIN LEMPTY 



F'T P, PSAVE 
EQUIVALENCE UP,P» 
E'O CLUSTR. 
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LIST. (TEMP) 

STATE-TR*NS*TPGPTOP.ftST»,14 

W'R STATE. E.O, T'O ERR 

N ULST " CONT. i N CWDOT. I L 1ST. 1 91 , STATC Hit 



NEWVAL.<$RELAT*,*CLUSTR$, NULST) 
RDR~S€$ft©R_.+l_5T» 

NEWBOT.iLPAREN, NULST) 

5UB-0 

LOOP IP=SEQLR.(RDR,I> 
W 'R I .E -l 



NEWBOT.tRPAREN, NULST) 
W'R LtMPTY . ( TEMP V 

IRALST. {TEMPI 

F' N N ULST 

E'L 

RDR*POPTOP . (TE H P ) 



SUB=0 

O'R I,W 

W'R SUB.E.l 

O'E 
N E WTOP. (RDR , T EH P ) 



RDR=SEQRDR.(IP1 
NEWBOT^+L PARtN^NytST^ 
E'L 

■--EM-R *Pn*€n.*GR*.OR T IP,E,*EXORS- 
NEWBOT.(IP,NULST) 

-&J-E 



SUB-1 
PSAVE-P 

E'L 

T'O LOOP ■ — 

R 
-R 



INTERNAL FUNCTION (SLST) 

EMJ DO I* ----- 

SUBLST=C0NT.(NEWB0T.(LIST.(9),NULST)+ll 
N€WV AL . ( i PROB* , PSAVE, 5U8L S T > 
NEWVAL.l$RELATS,POPTOP.(SLST),SUBLST) 
DRDR^S EQ ROR . (SLST ) 



OLOOP NEXT=SEQLR.(DRDR,DI) 

W'R frT^E»lr F'N 
NEXT=TRANS.(NEXT,2) 

W'R NEXT. E.O, T'O €RR 

LOC=HEMBER.( NEXT, STATE, 0) 

RCMOVC^tLOC) 



LOC=MEMBER.( STATE, ITSVAL. t $MEMBERS,NEXT) ,0 ) 
AOO- LNKRnr «:ONT . < LOG > » 
SUBST. (NULST, ADO) 

NEWBOTrLNEXT,SOBLST» 

T'O DLOOP 

-e-«-N — 



R 

-■■ R 

ERR PRINT COMMENT *ERROR IN FORMAT* 

F*N- — L 

V'S LPAREN=${* 
V'S RPARE N -S I S 



E'N 
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INTERP 



MAO 



EXT E RNAL F UN C TION (C LUSTR ) 



N'R 

I NS€RT--FiL-E-COMMQN- 

E'O INTERP. 

5¥*CNT*0^ 

BASE-0 

S P O INT- 



STACK(0>*0 

^OR=S€<W«R-tCtUSTR+ 

LOOP ELEM=SEQLR.(RDR,I) 

*'R I.&.1 

PUSH. (ITSVAL.(*RELAT*,CLUSTRI) 
CO P E^EVAL .I O) 



W'R STACK(0).£.S*INC»* 

STA CK (O) ^1 — 

O'R COOE.L.O. OR. STACK (0).E.$FALSE* 

STACK! 0+*0 

E'L 

F'N S TA C M Q) 



O'E 

P USH , (ELEttJ 

T'O LOOP 
-- C-LL-- 

E«N 



-EVAt MAO 

EXTERNAL FUNCTION (DUMMY) 
NJLR 



INSERT FILE COMMON 

E'O EVAL. 

POP. (NAME) 

R 

R CHECK FOR PRIMITIVE 



LOOK 



T'H LOOK, FOR J-l , I, J .G.NPRIM( 0) 
K'R HA*E.NE-NP*-P«UJ^T«0 LOOK 
CODE-PRIM(J).IO) 

f*H CODE 

CONTINUE 



R USER DEFINED FUNCTION 

r 

T'H ULOOK, FOR J*l , 2, J.G.UFUNC ( 0) 
WiR^JFUNCUl^NE. NAME, T'O ULOOK 
LST«UFUNC(J+1) 
H ' ft L S T . E . O , TlO E RR 



ULOOK - 
ERR 



P R O C 



T»0 PROC 

CONTINUE 

P*T ERRM, NAME 

V*S ERRM-*CW*A NOT DEFINED./** 

F*N -I 

RPR - SEQRDR.(LS T ) 



LOOP NEXT-SEQLR-(RDR,I) 

W'R I.E.I 

POP. (NEXT) 
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RETAC.(NEXT) 

F'N STACMO) 

O'R NEXT.E.S<$ 

P US H * tO) 

NEWTOP. ( SPOINT.OPSTCK ) 

*£XT=SEQLR.IKE»»n 

NEWTOP. (NEXT, OPSTCK) 

O'R N£XT.E**>$ 

NEXT-POPTOP.IOPSTCK) 

W ' R N EXT.A.77K10 .E . 54 K1O 



ARGF.(NEXT) 
^FHHtEEP 
E'L 

PU5H--4NE*T) 

KEEP SAVE DATA RDR, BASE 
SAVE RETUR N 



BASE*POPTOP.(OPSTCK) 

COOE-EVAL.4 0) 

RESTORE RETURN 
RESTORE DATABASE, ROR 

W'R CODE. L-0. OR. CODE. E.$FALSE$, F'N CODE 
O'R N E XT .A . 77X10. E .S 'i KlO 



ARGF.(NEXT) 
O'E 

PUSH. (CONST(NEXT)) 

E'L 

T'O LOOP 
E-«-N 



CONTRL MAD 

EXTER N AL F U N CTIO N t SPOT) 



N'R 

INSERT FILE COMMON 
E'O RETAC. 
STACMBASE> = SPOT 
SPOINT=BASE 
-F-»-N 



E'O PUSH. 

SPOINTxSPOINT+l 

STACK(SPOINT>=SPOT 

F'N 

E'O POP. 

S P OT -S TACK(S P OI N T) 

SPOINT=SPOINT-l 

F'N 

E'N 



APRIM MAD 

EXTERNAL FUNCTION (DUMMY) 

N'R 

STATEMENT LABEL X 

B ' N F IRST , S E CO N D , A C HE GK ,B V 

V'S NRMBIT=4000000000K 
INSERT FILE COMMON 
E'O L. 
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W'R ACHECK.f BSTORE) 

BV=FTEMP2.L.E*EMP4 
O'E 
BV-TE H P2 . L . TE M PI 



E'L 

I -'-O- S5XOAE 

E'O LE. 

W »-R— ACHECK, ^BSTORE % 

BV-FTEMP2.LE.FTEMP1 
_0±E 



BV-TEMP2.LE. TEMPI 

Eit - 

T'O BSTORE 

^^O EG. 

WR ACH6CK.( BSTORE) 
8 V»FTE M P l. 6 . FT EM P2 



O'E 

«V»*E*PLE T TEMP^ 

E'L 
T-Mj BSTORE^ 

E'O GE. 
W'R ACHECK.tBSTORE) 



BV«FTEMP2.GE.FTEMP1 

0»E- 

BV-TEMP2-GE. TEMPI 

T'O BSTORE 

E'Q ( i . 



W»R ACHECK.l BSTORE) 

BV* FTEMP 2 - G . F «MW 
O'E 

_BV* TEMP2 .WTE MP 1 

E'L 
B4IORE RETAC.(BV) 



F'N 
-EiO PLUS-- 
W'R ACHECK.(BACK) 

-FTEMW^XEMPl+FTEMPZ- 

O'E 
TEMPlaTEMPIfrTFMP? 



E'L 

T'O BACK 

E'O MINUS. 

W'R ACWECtU 1 BACK4 

FTEMPX=FTEMP2-FTEMPl 
-&i-£ 



TEHP1=TEMP2-TEMP1 
_EJ4_ 

T'O BACK 

E*Q TIMES- 

W'R ACHECK.(BACK) 
FTEMPl a= FTEMPl»FTEMP2 



O'E 

TEMP1*TEMP1*TEMP2 
E'L 

T'O BACK 

E'O DIVIDE. 

W ' R A C HE C K . (B AC K) 



FTEMP1=FTEHP2/F TEMPI 
O'E 

TEMPI=TEMP2/TEMP1 
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E'L 
BACK RETAC.tTEWJr)— 

F'N 



-R- 



R 
INTERNAL FUNCTION U) 
E'O ACHECK. 
POP. t TEMP 11 
POP.ITEMP2) 
— W'R T EM P I .E. »» INC * ».OR.T EMP 2.E.» » I N C * S 
TEMPl=$*INC»$ 
»V«LB 
T'O X 
E'L 
W'R TEMP1.A.NRMBIT.E.0 

F IRST - O B 

O'E 

EIRST-iB 
E'L 

W'R TEMP2.A.NRMBIT.E.0 
SECOND^OB 

O' E 

SEC0ND*1B 
E'L 
W'R FIRST. AND. SECOND 

E'N IB 

O'R FIRST. 6QV. SECOND 
F'N OB 



O'R FIRST 

FTEMP2~TEMP2 
O'E 

FTEMPl*TEMPl 
E'L 
T'N I B 



E'N 
E'N 



LPRIM MAD 

EXTERNAL FUNCTION t DUMMY ) 

N'R 

INSERT FILE COMMON 

B'N TEMPl,TEMP2,BTEST,BV 
CO A N D. 

W'R .NOT.BTEST.(O), F'N -1 

BV=«TEMPl.AND.TEMP2 

T'O STORE 

E'O OR. 

W'R .NOT.BTEST.IOJ, F'N -1 
O VTCM P 1.0R.TC MP 2 

T'O STORE 

E'O EQV. 

W'R .NOT.BTEST.(O), F'N -1 

BV-TEMP1.EOV.TEMP2 

T'O STORE 
E' O N OT. 

POP. (TEMPI) 

BV*. NOT. TEMPI 

T'O STORE 
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STORE 



NEXT 



RETAC. (BV) 

F**-0 

R 

-r-* ■ — 

INTERNAL FUNCTION (X) 

£ ±X)--BTE5X, 

Tl-STACMSPOINT) 

^2*STAGK+SPQT*T-i+ 

U*R TLE.O.OR.Tl.E.l 

P O P .( T fcM P l) 

0«R TLE.*»INC»S 

POP „-( JEMP4 *- 

TEMPI-IB 
o*f 

F'N OB 

fcJLL ■ 

U*R T2.E.O.OR.T2.E.1 

T*QP.44E*P2T 

0*R T2.E.S*INC«S 

PQP^TEMP2* 

TEMP2-IB 

OM 

F'N OB 

e j ! 

F'N IB 
E« N 

E»N 



SYNATS *AO _-._-- 

EXTERNA! FUNCTION (DUMMY) 
*M-R 

8*N BV 
... 4 NSERT FILE COMMON 

E'O PRES. 

PQP^UEMP) 

LOC-MEMBER. ( TEMP, SYMLST, ) 
W'R LOC.E.O 



RETAC. (MARK) 



O'E 



BV-TEMP.G.O 

SYMCNT-SYMCNT+l 

RETAC.(BV) 



-EJ-L- 



F*N 

V*S MARK*»»INC»* 



E*0 ATTRIB- 
C OO i -0 



LIST.ILST) 
POP.(SYMP) 
POP. (ATT) 
M*R SYMP.L,0 

CODE-SFALSES 
VA L «tF AL SEt 



T»0 RET 
E«L 
LOC»M€MBER. ( SYMPg SYMLST*0) 
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CHECKP.(SYMP) 

T^O WfcOOA 

R 

« HERE T H E t NO R MA L' RE S ULT OF A TE S T I S 

R PROCESSED. 

NCONP LIST. (SAVE) 

-P-NSCQNP, J TE-ST ♦ PR 1 0« , SAV&J 

CHECKP.(SNORMS) 
IRALST . ( RL ST) 



F'N RLST 

R 

INJ*RNA1^FUNCT JO*L ^MARJC^ 

E«0 CHECKP. 

W R P . C O 

LIST.(SCRAT) 

^AW¥USCJUT^»PROft4»P t »PittOR^SAV£^tRES ULT t > MARlU^ 

NEMB0T.(LIST.(9),RLST) 

JUK€DL^I^CR*T^B0T T +R|,^T4^ — .. 

IRALST.(SCRAT) 
— fcJLL . 



IRALST.(SAVE) 

fjLfi 

E'N 
EiN- _ 



RELTST HAD 

EXTERNAL F^NCT-WN- -HST-r*llI OR+ 

N»R 

I NS ER T F ILE CO M MO N 



B»N LEMPTY 

-STATEMENT 4.ABEL- -SWTCH 

F»T P, THRESH 

-£4Ui\Ml£MC£-tlP t £l- -. 
R 
R THIS FUMTTinM nFT FBHTNFS Al l T H F TFST<? H"KH 



R ARE RELEVANT TO THE CURRENT STATE LIST OF 'PRIOR 1 . 
* -JES-ZS-HiilCH HA¥E ALREADY BEEN RON ARE- IGNORED. 
R 
--E-AQ -REITS?.. 

COUNT-0 

U S T . tRD RS TK) 



W'R THRESH. G.l. 
SWITCH-RET 

STATE-0 

©LOSS, ( PR+OR, ST ATE4 

T»0 DOl 
-QA€ _ 



SWITCH-LOOP 
EiL 

SR-SEQRDR. (PRIOR) 
LOOP 5?ATE«S£-QLR„{SR,&I4 

W*R SI.E.l 
R« IRALST.(RORSTK) — 



F'N COUNT 
E'L 
IP*SEQLR.(SR,SI) 
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U*R P-LE. THRESH, T'O LOOP 

qq^ S*R»SE QROR» ( ST ATED 

INLOOP SVNP»SEQLR.tSYR,SYI> 
W ' R S VUC . l 



H'R .NOT.LENPTV.IRDRSTK) 
S VR-P O PTO fr, I RDRSTK > 

T'O INLOOP 
-B*E 

T'O SWITCH 
_E*4s . 



0*R SYI.L.O 

i^e-nnL-esp 

O'R ITSVAL.I*RELATt,SYNP).NE.O 

— ^lEWTOP*^ SY R tRDRS T^^+ 

SYR-SEQRDR.(SYHP) 

T'O I N LOO P 

O'E 
Tr*lTwllOT- 1 SYMPI 

W'R MEMBER. ITEST.CELL! 11 ,ll.NE.O. T'O INLOOP 
^'R-MEMR«UWSf,L*T T 04~NE>O^ M> I N LO OP 

U'R MEMBER.«TEST,TSTRUN.O).NE,0» T'O INLOOP 
NC HR OU i T B ST , L 8 T> — 

COUNT-COUNT* 1 

fMJ-HILOeP 

E'L 



TOPT MAO 

EKTERNAL-FUMCT-teN- -frATEST-fS TAT E 1 - 

N* R 
B » N LE H PTV 



INSERT FILE COMMON 
-pJ-T-LSAVErDSAVErL-S 

EQUIVALENCE ( ILSAVE.LSAVE ) , < ILS.LS ) 
E^D TOPTH. 

SWITCH-I 
LI S T .INQS O Q OI 

T'O START 
E*e-WPT, 

SUITCH-2 
START R-SEO ROR . I IT SVAL^I S VALU E S »,TR€E4>- — 

RET-0 
S T A TC » SC Q LR»IR i ri ■ 

ilsave-seqlr.ir.fi 

DSAVE - LS AVE 

ADO-LPNTR.(R) 
tOOP TEST-SEQtR»-f*rH 

W'R F.E.I, T'O END1SWITCH1 
lLS - SC Q LR . «R t F i 



U*R LS.G.DSAVE 
^TMX SAVE1 S W ITC H4 

O'R LS.L.LSAVE 

STATE* 

LSAVE-LS 
AT E S T - T ES T 



AOD-LPNTR.IR) 

b 

SAVE<2) T'O LOOP 




Wi J|! 1 !l(J',^!#f ! *!-«.'!* 
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SAVE(l) NEMTOP.(TEST V NOGOOD) 

T*O-440P 

ENDU) W*ft .MOT. LENPTY.INOGOOD) 

RR T « | 



NEttTOP. I NOCOOOt CELL 11)} 

. _£AL 

IKALST.INOQOOO) 
ENOI31 ItKttflW (imt i ifimt liwni u 

REMOVE. I ADO) 
R1*-RE* 



E*N 



-SE1UP MAO- 



EXTERNAL FUNCTION <N1,N2) 



F«T PROS 

••M LSMPTYfNAMTST 

IMS1RT HH COMMOM 



E*0 SETUP. 

- MA NY . t S T R U C T Al^^TlW^L4SJ^lSYMS^L4ST-t T ES T S) 1 - 
MUM-ITSVAU f$HSHNUM$, STRUCT) 

NEUBOT. f L I ST. < 9 1 1 STU ST ) 
M i M SQ T . U I ST .I R ),i Y HS) 



NENBOT.(LIST.(9) v TESTS) 

ML -C0NT-1NUE — 

LIST. (TEMP) 

w*W^9^~ ~ ■ ~^ — ~— — — ^ *w~v#nvv , T i v y^v^^v^^VMPt^v^ MnM|9^~" P~*^^~ H^1R^r~ ~ 

MORD*POPTOP. I TEMP) 
M I R WORO.l. tST A Tlt , 



T«0 ST 
-0 « R M ORO.e. t S Y MPSA- - 

T«0 SVNPL 
0* R MOftP ■E.tTESTSS 

T«0 TESTL 
0«R MORO.F.tWfFIMEt 



OEPtNE.(TEMP) 
— OlR-MORO.E. t SWSTA — 
T-0 SPTST 

- Q*R WQftp.l. tEXdLUSt - 
T*0 EXL 
O H 



PRINT COMMENT * ERROR IN SCO TAPE* 

CMMCOM.(O) . , 

E»L 



* I 

W N A Mi» PO P T P .I Ti MP) — 

ST ATE*L0OKUP. ( NAME • 1 ) 

MEWAl. I »P R 1 » »POWOP->LTEMP-USTATEi— 

STL OOF M*R LENPTY.(TEMP), T*0 LOOP 
S V MP"P OPTOP .t TE M P I 

I PROR»POPTOP. I TEMP t 
M t R FR QR.E. O . Tt Q tU O OR h- 



SYMP*LOOKUP. ( SYNP » 2 ) 
-MANY.44T4 V A1 . . I tME MR E R* . S Y M P USTATE+PROR)- 
NEHBOT. I SYMP« STATE ) 
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T'O STLOOP 



SYMPt- 



M 'R LC H PTV .I TE MPt, t'O LOO P 

P'T KK V TOP. (TEMP), NTHTOP.(TEMP,2> 

- Y' S KK» t a U>r*K»M 

NAME-TOP. <TE*P> 
— TE5T — LQ0RtJP» t IITHTOr#~i TEMP, 2 1 * rf - 
W*R NAHTST.INAME) 

t v H P-u oonu p , i name, a > 



NEMB0T.(SYM* 9 ITSVAL.(tMEM6ER$,TEST>l 

— VfCNDUf * I I E9 1 V JTBK I 

T'O SYMPL1 



-- EH. 

MEHLSTM TSVAL. ( $MENBERt, TEST) 
MSC Q APR . C N A M E l 



RLOOP 



TESTL 



$YNF*SEQtft.(R,n 
- H 'R F»€v»f-^0-S V HP tl - 
SYNP-LOOKUP. ( SVMP, 2) 

NEVBOT.(SYMPtMENLST) 
T'O R4 P 



M»R LCHPTY.CTEMP), T'O LOOP 
yiiiMTni i TrtiP 1 

COST-MTHTOP . ( TEMP , 2 ) 

UfM **»— ** y mmf I 

IV vk WW9*™*9 m m R PVPVm. I 

MEMBOT.C COST .LOOKUP. I NAME, 3)) 



T ' O TCSTkl 
E'L 

■ ^fcAi M IrJiMf 4 — - 

T* ™ ^R- ■!% UR ■ 1 IVM ■!_ 1 

NEXT»SEQLR.(R,F) 

— M*R-F»E»1» T«0 -T ESTL1 

NENB0T.<C0ST,L00KUP.(NEXT,3)) 
T ' O TfcOOP 



TLOOP 



SYMPL1 



SPTST M'R LEMPTY.f TEMPI, T'O LOOP 

T€ST-L90KU*-4*eP4eP r 4*EMP4r34- 

NEWVAL. I tSPTESTS, S YES*, TEST ) 

T*Q SPTST 

ENOO IRALST.(TEMP) 



F ' N 



?E£H.*-- 



POPTOP.f TEMPI 

POf*l OP» fit" T 

T'O SYMPL 
— PGPTOPwWEWI— 
POPTOP.(TEMP) 



T ' O TC S TL 

U'R LEMPTY.(TEMP), T'O LOOP 

— Tl a TRANSw I PQPTOP, I TcHPrr»"» 
T2-TRANS . I POPTOP. I TEMP » , 3 1 

- -NEWAtTt+EJ&LUSiT** rT*l 

NEHVAL.(SEXCLUSS,T2,T1) 

T ' O CML 



EXL 



E'N 



LOOKUP MAO- 



-EXTERNAL FUNGTHHt-tWOROiLCOOE)— 
N'R 
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INSERT FILE COMMON 

-6*fl LOOKUP , 

HLIST-0 
A D P -UO C AT E-1Q ) 



H«R ADD.E.O 
--NE4I6QT^-Ll£t«-19-).44LI-ST-1 

LST-BOT.IHLIST) 
— HAJt-LCOOE^^t,. -NEWVAL^^MEMBEIUrL-tST^^J-r LS*^ 

NEWVAL.(tPNAME$,WORD,LST> 

E* L 



F«N LST 
— E*«-TRAN*, 

F»N LOCATE. 101 
^NTtRNAL— FUNCTHMi^X) 

E*0 LOCATE. 

HSM M UH « I T S VA L. U MSHNUM t , S TRUCT) 

HLIST-NTHTOP.INTHTOP.t STRUCT, LCODE>, HASH. <WORD t HSHNUM)+l) 

ROA-SEOROR, tHLlSTJ 

LST*SEQLR.(RDR,I) 
- H*R KE^^F^'4i~0 

W*R ITSVAL.tSPNAME*, LSTI. E. WORD, F*N LST 

TT Q LO OP 



LOOP 



E«N 



INSECT 



MAD 
— -EXTERNAL— FUNCT KIN- +LlrL*rL^- 
N*R 

BM*-LEMP*¥ 

E«0 INSECT. 



LOO P 



R THIS FUNCTION DETERMINES THE INTERSECTION 
-R-QF— L1^AMB~12-AW -P L AC S S-THE- ANSWER- 4*-L3-- 
R 

__ROR*SE«ROR*LHJ 

LIST.(RDRSTK) 

EL£M»SEQLB. tftOft,!) 



W'R I.E.I 

W«*-L£MPTlk.-UU)RSTJU 

IRALST.IRDRSTKJ 

F-iN 

0«E 
RPR- POPTOP . ( RO R S TK ) 



E»L 

0*R I,L^O 

0»R ITSVAL.(SRELATS,ELEM).NE.O 

^EWTOP^iADRrRDItSTX^ 

RORbSEQRDR.CELEM) 
D' W M EM B E R. UA BS. ELEM,L 3| Q >. NE .Q 



NEWBOT 

-EH. 

T»0 LOOP 
__giN 



,<ELEH,L3) 



MEMBER MAD 12/12/66 2321.6 



92 



00000 
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EXTERNAL FUNCTION 1 GOAL, LST, LEVEL J 



E'O MEMBER. 
ADD - 



RDR=LROROV-(LST) 
DESCND NAME*A0VSWR.4*DR,-I-* 

W'R I.E.I, T'O RETURN 

* A R LCNTftWRDRUL.LEVEL-F^TMJ DESCND 

COMPAR W'R NAME. £. GOAL, T'O FOUND 
N AME - ADVLWR. ( RDR , I i 

W'R I.NE.l, T'O COMPAR 
ASCEND WM*- tCNTR. f RDRKE.^r T-^-RETiiRN 

LVLRVl.(RDR) 
ADVLNRri^BRnrH 

W'R I.E.I, T'O ASCEND 

T ' O DESC N D 

FOUND AOD=LSPNTR.tRDR) 

RETURN-- IKARBR^LROIU 

F'N ADD 
E*N 



UNDO MAD 

EXTERNAL FUNCTION {LSTJ- 

N'R 



INSERT FILE CO MM O N 

£•0 UNDO. 

R0R^SEQRDR.UTSVAi T t4S*MP-5*rL^m 

LOOP SYMP«SEQLR.IRDR,I) 

WM* I.E.I, F»* 

RDR1=SEQR0R.(PATSTK) 



LOOPl NE XT-S EQ LR - (ROftl t U > 

W'R Il.E.l 

-NEWBOT.4i¥NP^UNAGT&)- 

T'D LOOP 
0^*- MEMBER .< SYMP , I TSVAL^f*SYMPS*rN£XT^ T 1 »*E »0 
T'O LOOP 
qi^ 



T'O LOOPl 



E'L 
E'N 



DSKRD9 MAD 12/01/66 202^.2 LA* QO0O« 

EXTERNAL FUNCTION (FIRST, SECOND, LST> 

N*S INTEGER - 

INSERT FILE COMMON 
D' N I N T(21), N A M Cm,OT H ER(2n 

E'O DSKLST. 

V'S MOOE-l 

T'O START(MOOE) 
---NAME <<H»* JUST, LFTRST) 

NAME(lt*RJUST.( SECOND) 

BF OP EN. < » R $, NAME ( Q > , N AME ll ),B U F U 4 32 ) , B U F 2 < V»2 > ,- , E R R ! 



START (1* 



MODE-2 
START t2> BFREAD. (NAME! 0», NAME! D, INT W>- 
COUNT-LNKR.(INT<0>> 



l r EOF,E0FCT f ERRri- 
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BFREAD. { NAME < 0), NAME <1 I, INK COUNT) ... COUNT, EOF .EOFCT , ERR) 

jrAH_S»U^H*-F«R-J^0UMT^-l t -+^.E*-4) 

SWITCH OTHER (COUNT-!)- INT II) 

Q T H ER ( 2U« COUN T — 

K-VLIST.tOTHER,LSTI 
WrR_K-*€*-»*IOT-«TV r -TiO-STARTt2J 

F»N K 
EOF BFCLOS>+HAM&+Q>, NAM E( 1 > , ERR) 

MODEM 

Ft N t PQMfc * — 

ERR PRINT COHMENT SGOOF ON READING FILES 

NOOfe-4 

F*N $DONE$ 
E^N 



RELEV MAD 

EXTERNAL- FUNCTIW -tSYMIt, LSJ-I-- 

N*R 
EtO RELEW. 



RDR-LRDROV. I ITSVAL. < $HEN8ER$, S VHP ) 1 
too? — T-EST«AD¥LNIU-U10R*I-) ■ 

W'R I.E. It F*N 08 
NXJ^NENB*R^T*SJVL*T^UNE*0 + -F1N-4B — 

T»0 LOOP 
EXN = , 



LOSS NAD 
EX TE RNA L F UNC T ION IA1.A2) 



N*R 

_ -FJ-r-PR^LOSS^NCT^^A^E-t F CO N St yT OT^JU 

B«N LEHPTY 
—EQUIVALENCE I IPJU**lt4-lWCJVWCt4 

D«N PI (10), LOSS! 100, AD) 

WfS AD«2,1,10 



V*S SUBS-O.O, 10, 20.30, 40,50,60, 70,80, 90 

B*N LENPTY 

INSERT FILE CONN0N 
— EAQ-SE4L0** 

RET-0 
L I S T .IBUFFERI 



DSKLST.(A1,A2. BUFFER) 

_ S UE^P O PT OP^TBUFFg*) 

DSKLST.(A1,A2, BUFFER) 
-^ AN-LOOP-r -FOR-i-lr^lrJ-C-SJM- 
U'R LENPTY. I BUFFER), T*0 ERL 
NUM -PO P T O P . I BUFF E R ) 



NAME-TRANS. C POPTOP. I BUFFER ) , I ) 
MUUNAME«E*04-T-*a--ERX 

IPR«ITSVAL.C*PROB$,NAME) 

PHNUH)-PR 

LOOP NEWVAL. ft INDEX $, NUN, NAME) 
NAJUO. 



T*H L0OP1, FOR J»l,l, J. G. SIZE 

I ND-SUBSiJ-) 

M*R DSKLST.Ul.A2, BUFFER). E.SDONE*, T*0 ERL 



228 



HEIGHT -MAO 



L OO P 



EXTERNAL FUNCTION CLSTJ 
-H*« _ 



F*T UGTtANStPR 
-EQU U ALEHCE-lttCT •-TWGT-^-f-lPJUPTU- 
E*0 HEIGHT. 

ANS-O. 
iT A T i » tEQ m , |R t F> 



H*R F.E.I, F«N ANS 
— IPA*SEQLJU4JWF-> 

I W6T-I TSVAL. ( »WEIGHT$ f STATE ) 
_ANS*AN$*PR*HGT 

T*Q LOOP 
— EJLN 



FAST HAD 



E XTE RNAL FUNCT I ON I CQNT RLI 
N*R 
-BXN^LENPT¥~ 



INSERT FILE COMMON 
— E-MJ-FAST* 

LIST. (TEMP) 

P R I N T COMM E N T I VPU O R M E t 

R ( T $C6*«, ANS 

--H * R AMS. E .i Y OUt 

CBIT-0 

PR4N T C OMM ENT tC O N T RQ l L^ST4- 

RDLONL.ITEMP) 
DEPTH* PQPTOP . C TEMP I 



THRESH-POPTOP. I TEMP > 
-4IUIiTS«P0PT0P^tTEMP->- ~ 

N01SE«POPTOP.(TEMP) 
- XONT4IL«POP-TOP»f-TEMP-l- - 

PRINT COMMENT SCASESS 

KrUONL.l TEMPI 



NUM'POPTOP.fTEMP) 



CBIT-1 



PRINT COMMENT SHISTRV FILE$ 
R P IO N W . I TE M P I 



«»R .NOT.LEMPTY.t TEMPI 
- -F*L*l»R4UST»4*OPTOPv4TEMP-)+ - 

FILE2»RJUST. I POPTOP. ( TEMP ) 1 
-ASS4GN~*FiLE+rRUF^rBUF24 

E«L 

PRI N T CO M MEN T IC OP i St 

ROLONL. (TEMPI 
^P**QR~POPTOP~UEMPJ 

CPAT*POPTOP.ITENP> 
^ALLPAT«P0PTOPvtTEMP) 

ALLTST-POPTOP. I TEMPI 

CTE S Tw P OPTOP .I T EM P ) 



SIGNS*POPTOP.(T£MP) 

STAN0*POPTOP~4T£MP4 

W*R CPRI0R.E.2.AN0.CBIT.E.1 
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PRINT COMMENT 4DEPTH, THRESH, HEURISTIC CONTROL* 

^PTH-PflPTeP^IRfitONLiirKEM^H -- 

THRESH-POPTOP. ( TEMP ) 
CO N TRfPOPTO P «t TE M PI — 



E'L 

JRAtSf.(TEMfM- 

F'N NUM 

Em- 



FIRMUP MAD 
EXTERNAL FUNCT 10* i PAT-STK4- 

N'R 



ft'N SU B S E T 



F'T P.PR 
EQUIVALENCE fPRrJPR)^ 

E'O FIRMUP. 
P"Ow 

R*=SEQRDR.(PATSTK) 
PA*«-R 



LOOP 



LOOPl 



NEXT»SEQLR.(R,F) 
--W**-F»-E. 1 1 FM<-R 

CURPAT=ITSVAL.USYMPS$,NEXT) 
R*-PAT* 

CAND=SEQLR.<Ri,U) 
W 'R U. E .I 



IPR*ITSVAL.($PROB$,NEXT) 
-p.*p+p-R 

T*0 LOOP 
G*R CANO.E^NEX^ 

T'O LOOPl 
O'R SU B S E T.<CURPAT i ITSVAL.« t SY M PS* ,CAND l ) 

ADD=LPNTR.<R> 

N EX T» S EOLR- 4-R-r F ) 

REMOVE. (ADD) 

-4 lq- C HECK 

0»E 
T'O LOOPl 



E'L 
E'N 



SUBSET MAD 12/26/66 1718.4 44 
EXTERNAL FUNCTION (L1.L21 
N'R 

E'O SUBSET. 
R=SEQROR.(Ll) 



LOO P 



00000 



N CXT - SCQLR«tR i r 1 

M'R F.E.I 

f'N IB 
O'R MEMBER. (NEXT, L2,0).E.O 

F'N OB 
O'E 
T'O LOOP 



E'L 

E'N 
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SYMSAV MAO 

EXTERNAL FUNCTION ISYMP.TEST) 

N«R - 

INSERT FILE COMMON 

EM^SYKSAY,- 

H'R MEMBER. I SYMP, SYMLST, 0) .NE. 0. F'N 

NEH BQ T .l S Y M P , SYM LS T) 



UPD.(SYMP) 

FiN 

E'N 
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COMMON FAP 

WJH. COMN0*^33— 

BUF2 COMMON 433 

CAIX COMMON — I 

SIGNS COMMON 1 

{j-f* A-| bUrinu" * "■ ~ r ~ 

ALLPAT COMMON 1 

CPftlOR^ tOMMO* —I 

CTEST COMMON I 

A U.T S T COMM ON — I 

DISEAS COMMON 1 

STANfr- COMMON^ I 

FILE1 COMMON 1 

HtE2- COMMON^ -I 

OEPTH COMMON 1 

THRESH COMMON — I 

NINITS COMMON 1 

NOISE- COMMON -I 

NODES COMMON 1 

frASE COMMON I 

TREE COMMON 1 

CURLST COMMON — I 

PATLST COMMON 1 

STRU CT C O MM ON — 1 

SYMCNT COMMON 1 

£* MIST-COMMON — 1 

UNACTD COMMON I 

T S T RU N COM M ON — 1 

PATSTK COMMON 1 

CODE COMMON— I 

OPSTCK COMMON 1 
STAC* - COMMON —24- 
UFUNC COMMON 21 

AA6S COMMON U — 

PRIM COMMON 31 
CONST— XQMMON 34— 
SPOINT COMMON 1 
NPR4N- COMMON — 34 — 
CELL COMMON 21 



MACROS *AP- 



■ S T AC K M A NA GEM EN T MACRO S 



PUSH MACRO ARCS 
W ARCS 

TXI •♦l.ltl 
CLA ARCS 

STO STACK, 1 

l*P 

PUSH END 

POP MACRO ARCS 
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IRP ARGS 

CLA STACK»i 

STO ARGS 



POP^ 



IRP 
END 



«» l i l i l 



• LIST READING MACROS HERE 
SE Q RDR M ACRO A^-B 



CLA» 
STO 
SEQRDR END 



A 



GET LIST HEADER 
STORE TN-RE*DER^ELL 



SEQLR MACRO A.B.C 



-b*€^ 



CLA 

sre 

CLA 

- ST0 

ANA 

ARS 



SUB 

-- STO 

SEQLR END 



B,4 



It* 
A 

0,4 
B 

=0700000 
-^5 



READ E R LI N K 

GET DATUM FOR CELL 
SAVE- -DATUM 
ADVANCE READER 

SET FLAG 



-NTEST E+P 
ENTRY 

NAMTST SX* 
CLA» 

STO 



NAMTST 
SV4,A 
1»4 
CA N O 



TSX 
TXH 
STO 
- £L A 
SSP 
ST-A- 



ARS 
CAS 

TRA 
TRA— 
TRA 
-GfcA 



iGETMEM.4 

LIMIT 
GAND 

LI N K 



18 

LINK 

NO 

•♦2 

NO 

LI N K 



CAS 
TRA 
TRA 
CLA» 
STO 
-ANA — 



CAS 
TRA 
TRA 
TRA 
CLA 
-ARS- 



CAS 
T*A 
TRA 



LIMIT 

NO 

• ♦1 

LINK ■■- 

HEAD 

-0700000 



•0200000 

NO 

*+2 

NO 

HEAD 

-W 

LIMIT 

NO 

•♦1 
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TRA 
CAN& PIE- 
HEAD PZE 

LTMK Pi* 

LIMIT PZE 
6W0- 



SLF FAP 

■ DE PT H TO O BT AI N TH E BES T T ES T TO RU N . THE RO UTIN F 

• »GROWl» IS USED TO GROW NEW BRANCHES ON THE TREE IF 

• — NCvtvwAn T~* -~ " 



• STACK MANAGEMENT MACROS 
PUSH MACRO ARGS 

IRP ARGS 
TX4- •* W l+± 

CLA ARGS 
Sm _STACK-rl 

IRP 
R4JSH END 



POP MACRO ARGS 

IRP ARGS 

CLA STACK ,4 

STO ARGS 

I_U *»l t l t l — 

IRP 

ftQfL —END 



«- -tlST^ *EA04-NG- MACROS HERE^ 

» 

SE Q RPR MACR O At* — 

CLA» A GET LIST HEADER 

STO- * - STORE IN READER CE4X 

SEQROR END 

SEQLR MACRO A,B,C 

LAC Br* R E A PER LIN K 

CLA It 4 GET DATUM FOR CELL 

STO A ^AVE DATUM 

CLA 0.4 ADVANCE READER 

4TQ B 

ANA -0700000 SET FLAG 

WIS 15 

SUB *1 
STO C 

SEQLR END 
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ENTRY 
SEQDEC SXA 



SEODEC 
RE T , I — 



SXA RET+1,2 

SXA RET+2 r 4 

INDEX REGISTER 1 IS THE POINTER TO THE TOP OF THE STACK 

LXA ZERO,l 

INOEX REGISTER 2 IS THE LEVEL COUNTER FOR THE SEARCH 



LXA 



Z E RO , 2 



CLA» 


It* 


STfr 


LIST 


CLA» 


3,« 


STO 


STATE 


CLA 


DEPTH 


SUB 


ONE 


ALS 


18 


STO 


LTEST^ 


TSX 


$ITSVAL,4 


T XH 


VALUED 


TXH 


LIST 



VALU^ LT5T FO* TOP LEVEt 



S TO 



V A LUES 



♦ THIS IS THE WATN SEAftGH LOOP, 

» 

* ETRST GET TH€ DECISION LOSS^OE-THE CURRENT PRTOR 
LOOP CLA NODES COUNT DECISION NODES 



ADD 



-*^ 



STO 
TS* 

TXH 

STO 

CLA 



NODES 

*ITSVAL T 4 

PRIOR 

LIST 

PLIST 

STAT C 



GET-D4STRIBUT*ON FOR THTS- NODE 
SAVE NAME OF PRIOR LIST 



STO 
NOTERM TSX 
TXH 
TXH 
STO 
T*H- 



DECIDE 
IDLOSS,^- 
PLIST 
-DECIDE 
LSAVE 
LT E ST . 2 . 



OECTSTQN LOSS- FOR DISTRIBUTION^ 

DECTDE-NAME 



TSX 

TXH 

TXH 

TXH 

LTEST TXL 

T*E- 



SMANY.4 

VALUES 

DECIDE 

LSAVE 

DDWN.2,** 

CO N Tl N t 2 i 1 



SAVE DECISION VALUES IF AT LEVEL ZERO 



CHECK LEVEL AGAINST DEPTH 
RETUR N 



* HERE THE LEVEL IS LESS THAN THE REQUIRED DEPTH, 

• THE TREE IS DEVELOPED TO THE NEXT LEVEL AND THE SEARCH 
» CONTINUES. 



eewN- 



~r**- 



tRCLTST ,^ 

LIST 

PLIST 



GCT R ELEVANT TESTS FO R T H IS LEVEL 



TXH 
TXH 



» PROCESS THE BRANCHES AWAY FROM THE- -NODE DENOTE© BY M.L5T'. 

• EACH BRANCH CORRESPONDS TO A DIFFERNT TESTING ALTERNATIVE 

* AT T H E NODE OC N OTED BY 'LIST' . 



5EQROR 
SEQLR 



LIST,RO* 
TEST.RDR, I 



ESTABLISH READER -FOR LIST 
GET NEXT TEST 
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REAO CLA I 

CAS- ONE — 

TRA NOHEAD NOT A HEAOER 
MA **3 



TRA NOHEAD 
XXH **2«-2»0 

TRA RET END 

T«. GONT**,2W HQT-THE-ENO-OF-*HE^#lALVS*S 

• 

• P RO C E SS A S IN GL E T E ST BRA NCH H EBE 

• 

NOHEAD CLA ZERO 

STO ELOSS EXPECTED LOSS FOR THIS TEST 
JSX $GROWW* 1C ROW R E S ULT~UST--FOR- THIS- TEST-- 

TXH RDR 
TJU4 RL4SJ 

STO RESLST NAME OF RESULTS LIST 
•^ SAVE- VARIABLES- HERE 

PUSH IRDR, LSAVE, RESLST.PL1ST) 
«* $N£WTQP*4 J*UT-TH M^TE H-OH TEST ^STACK 

TXH TEST 

UM TS4AUN — 

* 

^-PROCESS-AL^ -R O S S I S LE RE SULT^4^^WE^SS*-CURRENTLy-SE4*C 

• EVALUATED. 

• . ■ 

SEQRDR RESLST. RDR 1 READER FOR RESULTS LIST 
REAM SEOLR L 1 S T . R PR 1.U — G E T LI S T FOR NE XT R E SUL T 

CLA II CHECK FOR HEAOER 
CA* ONE- 

TRA GOON 
TRAr *+2 *EADE* 

TRA GOON 



* ALL RESULTS FOR THIS TEST PROCESSED. RESTORE VARIABLES FOR TEST 

• EVAUIAT4QN 

• 

POP- IPLiST, RESLST rLSAVE-,RDJU 

TSX SP0PT0P.4 GET THE TEST NAME 
T-XH 1&UU1N — 

STO TEST 
TSX *BOT~,4- GET TEST COST- 

TXH TEST 
*AO ELOSS COMBINE WITH ELOSS - 

STO ELOSS 
T-XH CH E C K , 2 ,0 



TSX 


SHANYt* 


TXM- 


VALUES 


TXH 


TEST 


TXH 


ELOSS - 


CHECK CLA 


ELOSS 



SAVE VALUES IF LEVEL IS ZERO 



-**«- LSAVE IS T H I S T HE B E S T T O PA T i 

TPL DEL NO 
CLA - ELOSS BEST SO FAR 

STO LSAVE 
DE4 S*QLR--TEST,ROR^ _..._ .REMOVE THIS BRANCH 

LXD RDR, 4 
SXA TEMP-** — 



TSX 


(REMOVE, 4 


TXH 


TEMP 


TRA 


READ 
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■ SAVE VARIABLES 



GOON POSH (RDRl,ELOSS f LlST) 

*w toop T ^t -evcte- 



FOtD^ THiS^ BRANCH BACK +N TERWS-OF -E XPEtTEO- VALUE 



CO W TI N POP CHST , CLOSS t RPRi) R E STORE VARIA B LES 

TSX SITSVAL,* 
TXR PR OB* ^ET PROBAfrHrTT¥-OF- T H E BRA N C H 

TXH LIST 

ct|*1 nann . .. 

LDQ LSAVE EXPECTED LOSS 
RM» PROS 



FAD 


ELOSS 


5TO- 


--EL0S& 


TRA 


READ1 



-R€4 A*f 

AXT 

-AXT 

TRA 


»M 

••,2 

• •.4 


1,* 


• 

TESTQ — B€4 


1,TEST 



PLIST 

MHtt B€+ trPR+OR- 

TEMP 

TEST 

RDR 

ROR1 



I 

-H 

LIST 

ELOSS 

ONE OCT 
R E SLST 



LSAVE 

ZERO OCT & 

PROB 

PR08Q BCT l . PRGB 

STATE 

DEC I DC — 



VALUEQ BCI LVALUES 
VALUES 

INSERT COMMON COMMON PACKAGE 

ENO 



UPD1 FAP 

* THIS FUNCTION UPDATES THE PRIOR DISTRIBUTION IN 

■ ■ LSTI* B ASED O N TH E SIGN 'SVMP*. T H E N EW DISTRI B UTIO N 

* IS STORED IN »LST2«. 

* -- ... ... _ 

ENTRY UPD1 
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suas* 



INSERT 
MACRO 
LAC 
-CL-A 



MACROS 

READ6R T DATUM 
READER, 4 
-Gt4 



SUBST 
U PD l 



ARS 
PAC 
CLA 
STO 
END 
SXA 



18 

DATUM 
1t4 

R ET , 4 



CLA» 

sro 

CLA« 
STQ 
CLA« 
S T Q 



1,4 

S¥MP 

2,4 

LSU 

3,4 

L S T2 



D4F 
SAMS 



CAS 
IRA 
TRA 

TRA 



LST1 

DIF 

SAME 

SWITCH 

• ♦2 

S WIT C H 



CLA 
5TG 
TSX 
TXH 
TXH 
-S*0- 



FZERO 
P 

SITSVAL.4 
-HEMQ 
SYMP 
H E M L S T 



UQQP 
CHECK 



AEX- 



SEQRDR 

SEQLR 

CLA 

CAS^ 

TRA 
-TJU 



LSTl.RDR 
SI ATE, RDR, I 
I 

ONE 
MORE 
_*+2 



TRA 

CLA 

FSB 
IPL 
CLA 
AXT 



MORE 

_ p_ 

FZERO 
NOZERO 
P 
RET, 4 



TRA 



4,4 



NOZERO 
AGAIN 



SEQRDR 
SE4JL-R- 
CLA 
£AS 



LST2.RDR 
_SXATE t ROR, I 

I 
-ONE 



TRA 
TRA 
SEQLR 
CLA 
FDP 
S*4} 



• + 2 
RET-i 

PR, RDR, I 

PR 

P 

PROB 



SUBST 
TRA 



RDR, PROB 
AGAIN 



MORE 



SEQLR 
TSX 
-f-XH 



PROB, RDR, I 
*MEMBER,4 
STATE 



TXH 
TXH 
TNZ 



MEMLST 
ZERO 
* + 4 
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CLA FZERO 
STO PR 
TRA STEST 
H»A£ Gt4- — 



CLA 0,4 

PAC -&,A 

CLA 1,4 
STO PR 
TSX SPIJ.4 
-TXH S¥*P 



TXH PR 

STO ~PR 

STEST CLA SYMP 
*M_ MULT 
CLA »1.E0 

FSB PR 



STO PR 

MULT tOfl PRO& 

FHP PR 

STO PRGB 

FSB FZERO 

W= **2 



TRA SCRAP 

^LA P^ 

FAD PROB 
STfl P 
ZET SWITCH 
^T-ftA *+2 



TRA DIFPRO 
SOftST ROR^PROB^ 
TRA LOOP 
« - - 

DIFPRO TSX $MANY,4 
T-XM bS« 



TXH STATE 
TXH PRO* 
TRA LOOP 



SC RAP HI* S WITC H 



TRA LOOP 
LAC ROR,A 
CLA 0,4 

^ARS t8 
STA ADD 

-6tA R&R 



ARS 18 

STA ADD1 

SEQLR STATE,RDR,I 

TSX *REM0VE,4 

TXH ADD 

-TSX t R EH OV Ef 4 

TXH ADD1 

TRA CHECK 



FZERO OCT 233000000000 
-OWE OG* 1 



ZERO OCT 

SWITCH 

RDR 
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STATE 

I 

LSTl 

UU 

SYMP 

MEMO SCI 1,ME*BER-- 

ADD 

AOOl 

P 

P* 

PROB 

ME MIST 

END 



ENTRY MEMBER 

IHS6RT MACROS - 

MEMBER SXA RET ( 4 
SXA RET* 1.1 



CLA» 1,4 

«0 GOAL 

CLA* 2,4 

-4M UST 

CLA* 3,4 



CLA LIST 
ST* NEXT 

TSX ONCE.l 
RET AXT- **,4 

AXT «,1 
UU 4*4 



LEVEL1 SEQROR LIST.ROR 
LOOP SEQLR NEXT, RPR, 1 

CLA I 
CAS ONE 

TRA GOON 
-TRA *** 

TRA GOON 



TRA RET 

GOON TSX O NCE 1 1 



TRA LOOP 



vPHrC vCvnvn fit Al |n 

OLOOP SEQLR CANO,R f F 
CLA F 



CAS ONE 

-TRA MORE 

TRA *+2 

IO.A HQDC 

ZAC 
-TRA 1*4- 



MORE CLA CANO 

_CAS GOAL 

TRA OLOOP 
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TRA »+2 

tra etao* 

LAC R,4 
GfcA 8t4 



ARS 18 

-AHA »077777 

TRA RET 



CAND 
NE XT 



RDR 
I 

R 

F 

GOAL 
LIST 



ONE OCT 1 
END 



UN FAP 

-ENTRY m^i 

ENTRY NSCONP 

*NSE*7~ MACROS 

SUBST MACRO REAOER.DATUM 
-4rA€ R E AO E R ,* 



CLA 0,4 

ARS W 

PAC 0*4 

CL* DATUM 

STO 1,4 



SU B ST — EN9- 



MCHECK^ MACRO -LAB*, LA B 2 , FLAG- - 

CLA FLAG 

&A~5 EJflc - ■ - 

TRA LABI 
T-ft* LAB2 



TRA LABI 

HC+tECK EW> 



* J UPD1 A -DOES 7Ht ST ANDARO 4JP©AT£ -0F^ 4STt INT& 4. S TZ 

• WHEN A SYMPTOM IS THE • AGENT*. 

■ 

UPDL STI INDIC 

RPR 17 SAVE ANOSEJ^ INDICATORS 

TRA START 



• •NSCOMP* DOES THE NORMAL UPDATE WITH A TEST AS THE AGENT. 

-m ■ 

NSCOMP STI INDIC 

- RIR 17 ._. 

SIR I 
START SXA RET r 4 

SXA RET+1,1 
GL-A* K4 'AGE N T ' 

STO AGENT 

CLA* 2,4— -F IRST LIST 

STO LST1 
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CLA* 3,4 SECOND LIST 
ST-& L5T2 

CAS LST1 SAME LIST.Q. 
T-RA *+2 NO 



SIR 2 YES 

-CLA iERO 

STO P 

JSX tlTSVALr^ 

TXH MEMO 
-WW AGfiJtt G ET ME M B E R LIS T O F AG ENT 



STO HEMLST 
• PROCESS EACH STATE ON LST1. 



SEQROR LSTl t ROR 
-LOOP SE4LR STATE , R P R, I 



CHECK HCHECK MORE, NORM, I 

CLA P 
RET- AX* »*t4- 

AXT »•,! 
LPJ IN P I C 



TRA 4,4 



NOR*--£EQRORr -LST^RDR -NORMAL 4XE--L- ST£ — 

AGAIN SEQLR STATE,ROR,I 

HC HEC K D IV .RE T-l . I 



OIV SEQLR PR,RDR«I 
CLA- PR 

FOP P 
ST* PRQfr 

SUBST RDR.PROB 
TRA AGA4N 



MORE SEQLR PR0B*RD&*4 

RFT 1 TEST PROCESS SNITCH 

XRA NC ANSCOMP-* - 

TSX GETP.l GET P( AGENT/STATE) 

TJW MEMLST 



TXH PR 
£LA AGENT CHECK FOR NEGATIVE RESULT 

TPL MULT 
£Uk =4~ES 

FSB PR 



-S*0 PR- 



MULT CLA -l.E-6 CHECK FOR 'ZERO* PROB 
ESB PR 



TMI OK 
RTEST RFT -2 

TRA SCRAP 
TRA L OO P 



OK LOO PROB 

FMP PR 

STO PROB 

.. ._ GL-A P ._... 

FAD PROB ACCUMULATE PROBABILITY 



S TO 



RFT 2 AGAIN TEST LISTS 

TRA SAME 

TSX $MANY,4 
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TXM LST2 

_ Twit fT irr _ 

mn 31NIC 
TXH PROB 

-^ra- tee? — 



SANE SUBST RDR.PROB 



NG GL-A *"1 *c0 ffrl T I AL lite PR — - 

5T0 PR 

SE4RM — H C H LST . R READ H EH B ER LIST OF TE S T 

NLOOP SEQLR AGENT, R,F NEXT SIGN 

_ u uec « fnny Mill T F . 

nvnCvn uuuni nut • 1 1 
GOON TSX SITSVAL,* MENBER LIST OF SV"P 
flW NEMO 



TXH AGENT 
-SW SVMCM- 



TSX GETP.l 

_ tmu C-WMCM. 

T1W1 JT TCB 

TXH TENP 
-GLA PR 

FSB TEMP 
-SW PR 



FSB "l.E-6 TEST FOR ZERO 

jj^t HLBB9 

TRA RTEST 

« 

• GET THE PROBABILITY OF A SIGN GIVEN A SYNP 



GETP CLA* 1.1 
ST* NOLO 

TSX $NENBER ( 4 
TX+f STAT^ 

TXH HOLD 
WW tt*G 



TNZ •♦* FOUND 

GL-A Z-ERO 

BACK STO» 2,1 STORE RESULT 
^RA J,4 

PAC 0,4 GET PROB CELL 
CtrA G>4 



PAC 0,4 
-GL-A 4r,4 

STO HOLD 
_ -t-T-A^ RIGHT 

ARS 18 FAST CHECK FOR NAME 
-ANA ■07TT77 

CAS RIGHT 

_ TBi ******* m unfA uiMC 

f km ifmvivn nu i « unnc 

TRA »*2 

_ TQ A ur*m M 

TSX $PIJ,4 POSSIBLY A NAME 
-t*H AGENf 



TXH HOLD 
NONAM CLA HOLD 

GGRAP — WW RDfcrt- 



SXA 


ADO, 4 


SEGLR- 


-ST*TE,ROR»I — 


LXD 


RDR,4 
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SXA 
TXH 



ADD! ,4 

ADO 
IRiHQ V i,* — 



TXH 



AOD1 
— CMCCK- 



SYMEM 

ADD 

APOl 



INOIC 
ROR-- 

R 

I 

F 
S T A T E 



LST1 

WW 

RIGHT 

HOLD 

TEMP 
MEMLST 



MEMQ aCI 

£JMW -OCT 

AGENT 

p 

PR 

PJUU 



1, MEMBER 
_0 



ONE 



OCT 
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